Global ETD Search

731	Datamining a využití rozhodovacích stromů při tvorbě Scorecards / Data Mining and use of decision trees by creation of Scorecards Straková, Kristýna January 2014 (has links) The thesis presents a comparison of several selected modeling methods used by financial institutions for (not exclusively) decision-making processes. First theoretical part describes well known modeling methods such as logistic regression, decision trees, neural networks, alternating decision trees and relatively new method called "Random forest". The practical part of thesis outlines some processes within financial institutions, in which selected modeling methods are used. On real data of two financial institutions logistic regression, decision trees and decision forest are compared which each other. Method of neural network is not included due to its complex interpretability. In conclusion, based on resulting models, thesis is trying to answers, whether logistic regression (method most widely used by financial institutions) remains most suitable.
732	The Social Environment and the Health Care sector / Sociální prostředí a zdravotnictví odvětví da Rocha Fernandes, Joao Diogo January 2012 (has links) The objective of this thesis was to defend an alternative approach by health policy makers for improving health outcomes through investing on social factors of peoples' lives, rather than by increasing health expenditures. In order to defend this theory, this master thesis addresses two research questions: Which are the social determinants of health with largest impact on health status of individuals? And what is the statistical correlation between those social determinants of health and self-reported health status, and psychological health, for Germany, Denmark, Spain and Ireland? The first question was answered by developing a comprehensive research among the mostrelevant literature in the field of social determinants of health and the second through the construction of a statistical multiple regression model. According this study the social determinants with largest impact on the health status of individuals are: physical activity, education level, the welfare state, emotional support, socio-economic status, living conditions, working conditions, and life balance. Regarding the results of multiple regression models all variables followed the expected trend and it was possible to proof significant statistical correlation in 7 of the 8 determinants chosen, especially in the cases of working conditions and life balance, where those having problems managing these aspects of life experienced 50% or in some cases 30% of the health status of individuals with positive experiences in these life dimensions.
733	Identifying factors that predict student success in a community college online distance learning course. Welsh, Johnelle Bryson 12 1900 (has links) The study's purpose was to identify demographics, educational background, finances, formal and informal education and experiences, reading habits, external environmental factors, psychological factors, and computer efficacy factors that predict a student's ability to successful complete an online (Web-based) distance learning community college course. Major student retention theories and student attrition and persistence research guided the study. Distance learners (N = 926) completed four surveys, which collected data for 26 predictor variables that included age, gender, marital status, ethnicity, support others, course load, first-time student, last semester attended, student type and location, financial stability, tuition payment, prior learning experiences, reading habits, family support, enrollment encouragement, study encouragement, time management, study environment, employment, extrinsic and intrinsic motivation, locus of control, self-efficacy, computer confidence and skills, and number of prior online courses. Successful or unsuccessful course completion was the dependent variable. Statistical analyses included Cronbach's alpha, Pearson chi-square, two-sample t test, Pearson correlation, phi coefficient, and binary logistic regression. Variables in each factor were entered sequentially in a block using separate binary logistic regression models. Statistically significant variables were course load, financial stability, prior learning experiences, time management and study environment, extrinsic motivation, self-efficacy, and computer skills. Selected predictor variables (N = 20) were entered hierarchically in a logistic regression model of which course load, financial stability, and self-efficacy were statistically significant in the final block. Correlation coefficients were computed for statistically significant predictor variables to determine whether the significance was confined to the control group or an overall level of significance. Findings were supported through cross-validation and forward stepwise entry of variables in logistic regression. Despite having two or more at-risk factors, distance learners who had high levels of self-efficacy, good computer and time management skills, financial stability, a favorable study environment, were enrolled in more than one course, and believed their prior learning experiences helped prepared them for their course were more likely to be successful. Distance learning binary logistic regression student retention computer skills self-efficacy Web-based online Web-based instruction. Community college students. Distance education. Academic achievement.
734	Regressão logística e análise discriminante na predição da recuperação de portfólios de créditos do tipo non-performing loans / Logistic regression and discriminant analysis in prediction of the recovery of non-performing loans credits portfolio Silva, Priscila Cristina 23 February 2017 (has links) Submitted by Nadir Basilio (nadirsb@uninove.br) on 2017-08-04T21:33:38Z No. of bitstreams: 1 Priscila Cristina Silva.pdf: 2177666 bytes, checksum: a8d3c5290664fa16f138371def86fcdd (MD5) / Made available in DSpace on 2017-08-04T21:33:38Z (GMT). No. of bitstreams: 1 Priscila Cristina Silva.pdf: 2177666 bytes, checksum: a8d3c5290664fa16f138371def86fcdd (MD5) Previous issue date: 2017-02-23 / Customers with credit agreement in arrears for more than 90 days are characterized as non-performing loans and cause concerns in credit companies because the lack of guarantee of discharge debtor's amount. To treat this type of customer are applied collection scoring models that have as main objective to predict those debtors who have propensity to honor their debts, that is, this model focuses on credit recovery. Models based on statistical prediction techniques can be applied to the recovery of these credits, such as logistic regression and discriminant analysis. Therefore, the aim of this paper was to apply logistic regression and discriminant analysis models in predicting the recovery of non-performing loans credit portfolios. The database used was provided by the company Serasa Experian and contains a sample of ten thousand customers with twenty independent variables and a variable binary response (dependent) indicating whether or not the defaulting customer paid their debt. The sample was divided into training, validation and test and the models cited in the objective were applied individually. Then, two new logistic regression models and discriminant analysis were implemented from the outputs of the individually implemented models. The both models applied individually as the new models had generally good performance form, highlighting the new model of discriminant analysis that got correct classification of percentage higher than the new logistic regression model. It was concluded, then, based on the results that the models are a good option for predicting the credit portfolio recovery. / Os clientes que possuem contrato de crédito em atraso há mais de 90 dias são caracterizados como non-performing loans e preocupam as instituições financeiras fornecedoras de crédito pela falta de garantia da quitação desse montante devedor. Para tratar este tipo de cliente são aplicados modelos de collection scoring que têm como principal objetivo predizer aqueles devedores que possuem propensão em quitar suas dívidas, ou seja, esse modelo busca a recuperação de crédito. Modelos baseados em técnicas estatísticas de predição podem ser aplicados na recuperação como a regressão logística e a análise discriminante. Deste modo, o objetivo deste trabalho foi aplicar os modelos de regressão logística e análise discriminante na predição da recuperação de portfólios de crédito do tipo non-performing loans. A base de dados utilizada foi cedida pela empresa Serasa Experian e contém uma amostra de dez mil indivíduos com vinte variáveis independentes e uma variável resposta (dependente) binária indicando se o cliente inadimplente pagou ou não sua dívida. A amostra foi dividida em treinamento, validação e teste e foram aplicados os modelos citados de forma individual. Em seguida, dois novos modelos de regressão logística e análise discriminante foram implementados a partir das saídas (outputs) dos modelos aplicados individualmente. Com base nos resultados, tanto os modelos aplicados individualmente quanto os novos modelos apresentaram bom desempenho, com destaque para o novo modelo de análise discriminante que apresentou um percentual de classificações corretas superior ao novo modelo de regressão logística. Concluiu-se, então, que os modelos são uma boa opção para predição da recuperação de portfólios de crédito do tipo non-performing loans. collection scoring non-performing loans regressão logística análise discriminante recuperação de portfólios de crédito collection scoring non-performing loans logistic regression discriminant analysis credit portfolio recovery ENGENHARIAS::ENGENHARIA DE PRODUCAO
735	Avaliação dos modelos Probit e Logit com aplicação na longevidade de sementes de soja / Faria, Rute Quelvia de January 2019 (has links) Orientador: Maria Márcia Pereira Sartori / Resumo: O estudo da longevidade é uma ferramenta importante na análise da qualidade fisiológica em sementes. A modelagem da curva de sobrevivência em sementes permite a predição do seu período de vida, que baliza os mais variados estudos em conservação e tecnologia de sementes. O modelo de Probit foi inicialmente proposto como o modelo ideal para predição da longevidade das sementes, contudo, estudos têm reportado certa dificuldade de predição do modelo em diferentes condições de estresse e armazenagem a que as sementes são submetidas. A equação da viabilidade em sementes a partir do modelo de Probit permite calcular o valor do P50, que é o período em que um lote de sementes leva para perder 50% da sua viabilidade. O modelo de Logit é similar ao de Probit, com a vantagem de ser mais simples, e de se adequar melhor ao comportamento dos dados com caudas pesadas. Assim, o objetivo deste estudo foi avaliar os modelos de Probit e Logit quanto a sua robustez na predição da longevidade das sementes. Para tanto, sementes de soja foram selecionadas quanto ao seu vigor, em delineamento inteiramente casualizado, e armazenadas à 35°C e 75% de umidade relativa, até que fosse constatada sua morte, por meio de testes de germinação realizados periodicamente. A construção das curvas de sobrevivência, após o experimento encerrado, permitiu a análise dos modelos de Probit e Logit, por meio dos parâmetros R2, Rajustado, e do coeficiente de correlação de Pearson. O estudo da normalidade dos resíduos tamb... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The study of longevity is an important tool in the analysis of physiological quality in seeds. The modeling of the survival curve in seeds allows the prediction of their half time life, which could be used to reference for the most varied studies on conservation and seed technology. The Probit model was initially proposed as the ideal model for seed longevity prediction, however, studies have reported about some errors found after applying the model under different stress and storage conditions in which seeds are submitted. The seed viability equation from the Probit model allows to calculate the value of P50, which is the period in which a seed lot loss 50% of its viability. The Logit model is similar to the Probit model, with the advantage of being simpler and better suited to heavy tails data, as occurs in seed longevity data. The aim of this study was to evaluate the Probit and Logit models for their robustness in predicting seed longevity. For this purpose, soybean seeds were selected according to their vigor, in a completely randomized design, and stored in 35 °C and 75% relative humidity until their death was verified by periodic germination tests. The construction of survival curves, after the experiment ended, allowed the analysis of Probit and Logit models, through the parameters R2 , Radjusted, and the Pearson correlation coefficient. The study of the normality of the residues was also performed to evaluate the models. The results showed the superiority of the Logi... (Complete abstract click electronic access below) / Doutor Curva de sobrevivência Distribuição Normal Eletro-energia Função de Ligação Glycine Max L. Regressão logística Survival curve Distribuição Gaussiana. Electro-energy Link function Logistic regression
736	Méthodes des matrices aléatoires pour l’apprentissage en grandes dimensions / Methods of random matrices for large dimensional statistical learning Mai, Xiaoyi 16 October 2019 (has links) Le défi du BigData entraîne un besoin pour les algorithmes d'apprentissage automatisé de s'adapter aux données de grande dimension et de devenir plus efficace. Récemment, une nouvelle direction de recherche est apparue qui consiste à analyser les méthodes d’apprentissage dans le régime moderne où le nombre n et la dimension p des données sont grands et du même ordre. Par rapport au régime conventionnel où n>>p, le régime avec n,p sont grands et comparables est particulièrement intéressant, car les performances d’apprentissage dans ce régime restent sensibles à l’ajustement des hyperparamètres, ouvrant ainsi une voie à la compréhension et à l’amélioration des techniques d’apprentissage pour ces données de grande dimension.L'approche technique de cette thèse s'appuie sur des outils avancés de statistiques de grande dimension, nous permettant de mener des analyses allant au-delà de l'état de l’art. La première partie de la thèse est consacrée à l'étude de l'apprentissage semi-supervisé sur des grandes données. Motivés par nos résultats théoriques, nous proposons une alternative supérieure à la méthode semi-supervisée de régularisation laplacienne. Les méthodes avec solutions implicites, comme les SVMs et la régression logistique, sont ensuite étudiées sous des modèles de mélanges réalistes, fournissant des détails exhaustifs sur le mécanisme d'apprentissage. Plusieurs conséquences importantes sont ainsi révélées, dont certaines sont même en contradiction avec la croyance commune. / The BigData challenge induces a need for machine learning algorithms to evolve towards large dimensional and more efficient learning engines. Recently, a new direction of research has emerged that consists in analyzing learning methods in the modern regime where the number n and the dimension p of data samples are commensurately large. Compared to the conventional regime where n>>p, the regime with large and comparable n,p is particularly interesting as the learning performance in this regime remains sensitive to the tuning of hyperparameters, thus opening a path into the understanding and improvement of learning techniques for large dimensional datasets.The technical approach employed in this thesis draws on several advanced tools of high dimensional statistics, allowing us to conduct more elaborate analyses beyond the state of the art. The first part of this dissertation is devoted to the study of semi-supervised learning on high dimensional data. Motivated by our theoretical findings, we propose a superior alternative to the standard semi-supervised method of Laplacian regularization. The methods involving implicit optimizations, such as SVMs and logistic regression, are next investigated under realistic mixture models, providing exhaustive details on the learning mechanism. Several important consequences are thus revealed, some of which are even in contradiction with common belief. Apprentissage en grandes dimensions Théorie des matrices aléatoires Apprentissage semi-Supervisé Machines à vecteurs de support Régression logistique Large dimensional learning Random matrix theory Semi-Supervised learning Support vector machines Logistic regression
737	Geostatistical Approach to Delineate Wetland Boundaries in the Cutshaw Bog, Tennessee Anderson, Victoria, Shockley, Isaac, Nandi, Arpita, Luffman, Ingrid 05 April 2018 (has links) Wetlands are one of the most productive ecosystems in the world, providing a range of services, including: water quality improvement, flood mitigation, erosion control, habitat, and carbon storage. It is estimated that Tennessee has lost 60% of its original 2 million acres of pre-European settlement wetlands. Recently, increased funding has been made available for wetland restoration and expansion. In response, the Cherokee National Forest has proposed a range of wetland restoration actions within the Paint Creek Watershed to expand and restore some of the existing bogs and fens, including the Cutshaw Bog, a 163,864 m2 wetland located 32 km south of Greeneville, TN. The U.S. Forest Service has proposed a new expanded wetland boundary to result from restoration efforts. However, to assess the potential for success, current wetland indicators based on soil color, texture, depth, drainage, sulfide materials, and iron concentrations were examined. Sampling locations were identified by overlaying a grid, composed of 64 cells, each 40.5 meter by 40.5 meter in size. Soil cores were extracted up to a depth of 0.6 meters from each sampling cell and evaluated in situ for hydric soil properties using the Eastern Mountains and Piedmont Army Corps of Engineers Wetlands Delineation Manual. Soil physical (texture, bulk density, moisture content) and chemical (pH, cation exchange capacity, % base saturation, Nitrogen, Bray II Phosphorus, Iron, Zinc, and Total Carbon Content) properties were evaluated in the laboratory. Results indicated 47% of samples taken within the proposed wetland expansion area currently have hydric soil characteristics and were located along drainage lines. Presence of hydric soils was correlated with soil physicochemical properties including bulk density, moisture content, sulfur and phosphorus concentrations, iron, and other metals. Statistical analyses for the northern section and southern section of the bog were completed separately, as they were physically divided by a French drain structure. Logistic regression models were developed using properties most strongly correlated with the presence of hydric soil. For the northern section, bulk density and iron were retained in the model, while for the southern section, iron was retained. A spatial model for the presence of hydric soil was developed by spatially interpolating the covariates through kriging. Next, a probability map was created from the logistic regression equation with raster math in ArcGIS Pro. Results indicate that Cutshaw Bog’s area cannot be expanded to the original proposed boundary provided by the US Forest Service and a new recommended boundary was delineated from the probability map. The results of this data driven approach will assist the Forest Service in targeted wetland restoration efforts at the Cutshaw Bog. Wetland Restoration Spatial Interpolation Soil Physio-chemical Properties Logistic Regression Hydric Soil Indicators Geographic Information Sciences Natural Resources and Conservation Soil Science Spatial Science
738	Time-lapse monitoring of sidewall mass-wasting events in a Northeast Tennessee gully McConnell, Nicholas, Luffman, Ingrid, Nandi, Arpita 05 April 2018 (has links) In the southern Appalachians, the dominant soil order, Ultisols, is highly susceptible to erosion. If left unmanaged these soils can develop into gully systems resulting in land degradation. This study examines gully development through sidewall mass-wasting events at a high temporal resolution using 30 minute time-lapse photography. Prior research at this site found significant mass wasting events occurring between weekly monitoring periods. By shortening the interval of observation to 30-minutes, a more accurate understanding of the frequency and intensity of these mass-wasting events, and their relation to meteorological factors, can be determined. Photographs of a gully (approximately 1.5 m deep by 3 m wide at the top) were captured every 30 minutes from 11/29/17 - 2/18/2018 with a WingScape outdoor time-lapse camera mounted on a plastic stake 3.16 m from the gully facing northwest and upstream into the gully channel. A total of n=1648 images were coded using presence/absence indices for six observed geomorphic processes: creep on NE facing sidewall, creep on SW facing sidewall, slump on NE facing sidewall, slump on SW facing sidewall, channel aggradation, and channel development. Precipitation and temperature data were collected every 5 minutes using a Davis Vantage Pro 2 weather station located 240 m from the gully, and were aggregated to various time intervals. Precipitation received in previous 0.5, 1, 1.5, 2, 3, 6, 12, 24, 36, 48, and 72 hours were calculated for each image. Two binary temperature variables were generated with values of “1” if temperature dropped below 0 °C (32 °F) during the prior 30 minutes or 24 hours, respectively, and “0” otherwise. Logistic regression models (forward conditional method) for the six geomorphic index variables were generated using the precipitation and temperature data. For creep on the NE facing sidewall, the significant independent variables are 3 hour and 72 hour prior rain, and freeze conditions in the previous 0.5 and 24 hours. On the SW facing sidewall, rain and temperature variables were also important for creep; rain in the previous 12 and 24 hours, and freeze conditions within the previous 24 hours were retained in the model. For slumping on both the NE and SW facing sidewall, recent and prolonged rain were important. Specifically, 1, 6, 12, and 24 hour rainfall were retained in both models, with the addition of 3 hour rainfall in the NE facing sidewall slump model. No temperature variables were retained. For channel aggradation (deposition of material in the channel), rain in the prior 12 and 72 hours, and freezing in the prior 24 hours were important, suggesting that freeze-thaw processes loosen the soil, and subsequent rain events carry material into the channel. When rain stops, the material is then deposited in the channel. Interestingly, no viable model could be developed for channel development (erosion) using these parameters. These results will be useful to quantify meteorological controls on gully erosion at short temporal scales. time-lapse monitoring gully erosion soil creep logistic regression soil slumping Geomorphology Multivariate Analysis Physical and Environmental Geography Soil Science Statistical Models
739	The predictive influence of variables in three different academic learning environments on the intentions of music education majors to leave the degree program. Corley, Alton L. 05 1900 (has links) Attrition rates among students in music teacher training programs have contributed to a shortage of qualified music teachers for the nation's schools. The purpose of this study was to investigate the predictive relationship of academic variables in three different learning environments and the intent of a select population of music education majors to leave the degree program. The study drew upon the work of Tinto, Bean and Astin to form a theoretical foundation for examining variables unique to student withdrawal from the music education degree plan. Variables were examined within the context of three different learning environments: (1) applied lessons, (2) ensembles and (3) non-performance courses. Participants were 95 freshmen and sophomore music education majors at a public university who were enrolled in the music education degree program during the spring semester, 2002. Data included participant responses on the Music Student Inventory (MSI), a questionnaire developed specifically for the study, and grade data from university records. Independent variables in the study included participants' perceptions of (1) Ensemble experiences, (2) Applied lesson experiences, (3) Non-performance music course experiences, (3) Course requirements, and (4) Performance growth. Additional variables included: (1) Ensemble placement, (2) Course grades for music theory, applied lessons and aural skills, and (3) cumulative grade point averages. Gender interactions were also examined. The dependent variable in the study was intent to withdraw from the music education program. Data were analyzed using a binary logistic regression procedure. Results of the analysis indicated that none of the variables tested were statistically significant predictors of subjects' intentions to withdraw from the music education degree program. Gender interactions were not evident among the variables. Although statistically insignificant, the strongest predictor of the variables represented by questionnaire responses was lesson experiences. The ana ysis of course grades for music theory, applied lessons and aural skills failed to produce a statistically significant main effect, but applied lesson grades produced the strongest effect in the model. Results of the study suggest that students' intentions to withdraw from the music education program are related to variables other than those representing the academic component of the music education program. Education, Higher. Music education majors attrition academic variables applied lessons ensemble experience logistic regression
740	The Occurrence of Rupture in Deep-Drawing of Paperboard Wallmeier, Malte, Hauptmann, Marek, Majschak, Jens-Peter 01 August 2018 (has links) The production of paperboard packaging components in fast-running machines requires reliability of the production process. Boundaries for the process parameters and constraints for the geometry of the tools require investigation to determine dependable configurations. This paper aimed to investigate the relationships between process parameters, tool geometry, and the occurrence of rupture in the deep-drawing process of paperboard. Different types of ruptures in various phases of the process were distinguished and linked to their specific cause. An extensive experimental investigation with multiple variables of influence was conducted. A logistic regression model was used to describe the experimental data and was statistically validated. The blankholder force was found to be the most influential parameter. Interactions between the parameters blankholder force, punch velocity, and punch diameter were recognized. A high punch velocity can reduce the probability of rupture when the punch diameter is adjusted. info:eu-repo/classification/ddc/620 ddc:620

Search results