Global ETD Search

211	Redes Bayesianas aplicadas à análise do risco de crédito. / Bayesian networks applied to the anilysis of credit risk. Cristiane Karcher 26 February 2009 (has links) Modelos de Credit Scoring são utilizados para estimar a probabilidade de um cliente proponente ao crédito se tornar inadimplente, em determinado período, baseadas em suas informações pessoais e financeiras. Neste trabalho, a técnica proposta em Credit Scoring é Redes Bayesianas (RB) e seus resultados foram comparados aos da Regressão Logística. As RB avaliadas foram as Bayesian Network Classifiers, conhecidas como Classificadores Bayesianos, com seguintes tipos de estrutura: Naive Bayes, Tree Augmented Naive Bayes (TAN) e General Bayesian Network (GBN). As estruturas das RB foram obtidas por Aprendizado de Estrutura a partir de uma base de dados real. Os desempenhos dos modelos foram avaliados e comparados através das taxas de acerto obtidas da Matriz de Confusão, da estatística Kolmogorov-Smirnov e coeficiente Gini. As amostras de desenvolvimento e de validação foram obtidas por Cross-Validation com 10 partições. A análise dos modelos ajustados mostrou que as RB e a Regressão Logística apresentaram desempenho similar, em relação a estatística Kolmogorov- Smirnov e ao coeficiente Gini. O Classificador TAN foi escolhido como o melhor modelo, pois apresentou o melhor desempenho nas previsões dos clientes maus pagadores e permitiu uma análise dos efeitos de interação entre variáveis. / Credit Scoring Models are used to estimate the insolvency probability of a customer, in a period, based on their personal and financial information. In this text, the proposed model for Credit Scoring is Bayesian Networks (BN) and its results were compared to Logistic Regression. The BN evaluated were the Bayesian Networks Classifiers, with structures of type: Naive Bayes, Tree Augmented Naive Bayes (TAN) and General Bayesian Network (GBN). The RB structures were developed using a Structure Learning technique from a real database. The models performance were evaluated and compared through the hit rates observed in Confusion Matrix, Kolmogorov-Smirnov statistic and Gini coefficient. The development and validation samples were obtained using a Cross-Validation criteria with 10-fold. The analysis showed that the fitted BN models have the same performance as the Logistic Regression Models, evaluating the Kolmogorov-Smirnov statistic and Gini coefficient. The TAN Classifier was selected as the best BN model, because it performed better in prediction of bad customers and allowed an interaction effects analysis between variables. Crédito Inferência estatística Modelos lineares generalizados Bayesian networks Credit risk Logistic regression
212	Modelos para análise de dados superdispersos de indução de haploidia em milho / Models for the analysis of overdispersed haploid induction data in maize Silva, Andreza Jardelino da 09 February 2017 (has links) O milho é uma espécie alógama cujo produto comercial são os híbridos, os quais originam-se do cruzamento de duas linhagens endogâmicas. Uma forma para obtenção de tais linhagens é por meio das técnicas de indução de haploidia e posterior obtenção dos duplo-haploides, permitindo até 100% de homozigose. Essas técnicas retornam resultados importantes no melhoramento de milho. Uma variável de interesse importante, obtida a partir dessas técnicas é a taxa de indução de haploidia, a qual trata-se de uma proporção entre o número de sementes haploides e o número total de sementes. O conjunto de dados foi obtido pelo cruzamento da linhagem indutora LI- ESALQ, com cinco genótipos comerciais de milho (2B587PW, 30F53H, BM820, DKB390 e STATUS VIPTERA), em duas gerações F1 e F2, por meio de um delineamento em blocos ao acaso, na área experimental do Departamento de Genética da ESALQ/USP. A teoria dos modelos lineares generalizados (MLGs) possibilita mais opções para a distribuição da variável resposta, exigindo somente que a mesma pertença à família exponencial sob a forma canônica. Tal classe de distribuições pode ser ainda expandida para modelos que permitem efeitos aleatórios no preditor linear, caracterizando a classe dos modelos lineares generalizados mistos (MLGMs). O objetivo deste trabalho foi analisar a taxa de indução de haploidia em milho tropical, utilizando um modelo binomial misto, com efeito aleatório em nível de indivíduo. O método de estimação foi o de máxima verossimilhança. Com base em tal modelagem, verificou-se que o genótipo 30F53H, destacou-se em relação aos demais quanto à eficiência da taxa de indução de haploidia. Todas as análises foram implementadas no software R. / The maize is an allogeneic species whose commercial product are the hybrids, which are gerated by the crossing of two endogenous lines. An alternative to obtain these lines is using the haploid induction techniques and subsequent doubled haploid production, that allows up to 100% homozygous. Artificial production of doubled haploids is important in plant breeding. An important variable, that results from these techniques, is the haploid induction rate, which is a proportion between the number of haploid seeds and the total number of seeds. The data set was obtained by crossing the inductive line LI-ESALQ, with five commercial genotypes of corn (2B587PW, 30F53H, BM820, DKB390 and STATUS VIPTERA), in two generations F1 e F2, in a randomized block design, in the experimental area of Department of Genetics, ESALQ/USP. The generalized linear models (GLMs) allow more options for the variable response distribution, requiring only that it belongs to the exponential family in canonical form. The GLM class can be expanded to models that allow random effects in the linear predictor, the mixed generalized linear models (MGLM) class. This work aimed to analyze the haploid induction rate in the tropical maize. The binomial mixed model, that included random effects in individual level, was proposed. The maximum likelihood method was used to estimate the parameters. The result revealed that the genotype 30F53H stands out in relation to the others regarding the efficiency in the haploid induction rate. All the analyzes were implemented in the software R. Haploid induction Indução de haploidia Logistic regression Mixed model Modelo misto Overdispersion Regressão logística Superdispersão
213	Site Location Modeling and Prehistoric Rock Shelter Selection on the Upper Cumberland Plateau of Tennessee Langston, Lucinda M 01 May 2013 (has links) Using data collected from 2 archaeological surveys of the Upper Cumberland Plateau (UCP), Pogue Creek Gorge and East Obey, a site location model was developed for prehistoric rock shelter occupation in the region. Further, the UCP model was used to explore factors related to differential site selection of rock shelters. Different from traditional approaches such as those that use (aspatial) logistic regression, the UCP model was developed using spatial logistic regression. However, models were also generated using other regression-based approaches in an effort to demonstrate the need for a spatial approach to archaeological site location modeling. Based on the UCP model, proximity to the vegetation zones of Southern Red Oak and Hickory were the most influential factors in prehistoric site selection of rock shelters on the UCP. Predictive Modeling GIS Prehistory Rock Shelters Spatial Logistic Regression Geospatial Analysis Archaeological Anthropology Geographic Information Sciences
214	Prevalence of and Risk Factors for Adolescent Obesity in Tennessee Using the 2010 Youth Risk Behavior Survey (YRBS) Data: An Analysis Using Weighted Hierarchical Logistic Regression Zheng, Shimin, Holt, Nicole, Southerland, Jodi L, Cao, Yan, Taylor, Trevor, Slawson, Deborah L, Bloodworth, Mark 29 October 2016 (has links) Background: The rate of adolescent overweight and obesity has more than quadrupled over the past few decades, and has become a major public health problem [1]. In 2011, 55% of 12-19 year olds in the United States (U.S.) were overweight or obese [2]. Adolescence is a pivotal time in which many health risk behaviors such as tobacco, alcohol, and drug use are initiated. Such health risk behaviors have been significantly associated with overweight and obesity among adolescents. Objective: The purpose of this study is to evaluate the relationship between obesity and the health risk behaviors most commonly associated with premature morbidity and mortality among adolescents with a novel micro area estimate approach that uses weighted hierarchical logistic regression to nest individuals in classes, classes in schools, and schools in districts. Methods: This study is a secondary analysis of a state-wide representative sample of middle school students that participated in the 2010 Tennessee Middle School Youth Risk Behavior Survey (YRBS). Data was collected from 119 (85.6%) of Tennessee’s local education agencies (LEAs), 456 (95.2%) schools, and 64,790 of 78,441 (82.6%) students. The outcome variable was adolescent obesity (≥ 95th BMI percentile). Explanatory variables were divided into four levels [1] district level: use seatbelt/helmet, asked to show ID for tobacco purchase; [2] school level: ever tried smoking, received HIV education in school; [3] class level: average number of days smoked, having ever exercised to lose weight; [4] individual level: having ever been in fight, early onset of substance use, physical activity, and thought about, planed, or attempted suicide. Weighted hierarchical logistic regression analysis was performed to assess the association between risk factors or protective factors and obesity using effect size (ES) and odds ratio (OR) estimates. Results: The study sample included 64,790 middle school students in the state of Tennessee with a mean age of 12.8 years, of which (49.42%) were females and (50.58%) were males. Nearly one-fourth of the students had a BMI at or above the 95th percentile (22.30%). Weighted hierarchical logistic regression analysis shows that seatbelt and helmet use [ES: -2.161 OR: 0.020, 95% CI: (0.006, 0.070)], and weight misperception [ES: 1.256 OR: 9.720, 95% CI: (9.216, 10.251)], having ever exercised to lose weight [ES: -0.340 OR: 0.540, 95% CI: (0.446, 0.654)], having ever tried smoking [ES: 0.705 OR: 3.581, 95% CI: (2.637, 4.863)] and gender (male vs female) [ES: 0.327 OR: 1.810, 95% CI: (1.740, 1.880)] were strongly associated with adolescent obesity. Results from this study also showed that Black, Hispanic or Latino adolescents were more likely to be obese than Whites, Indian, and Asian adolescent [ES: 0.129 OR: 1.260, 95% CI: (1.200, 1.330)], students with grades of mostly C, D and F were more likely to be obese than those with grades of mostly A and B [ES: 0.189 OR: 1.409, 95% CI: (1.303, 1.523)], and that students having an eating disorder [ES: 0.251 OR: 1.576, 95% CI: (1.508, 1.648)] and/or engagement in sports teams [ES: -0.197 OR: 0.700, 95% CI: (0.674, 0.728)] had small or medium ES association with adolescent obesity. Conclusion:This study uses small area estimates in weighted hierarchical logistic regression models to describe the prevalence and distribution of health risk behaviors associated with adolescent obesity among middle school student subpopulations in Tennessee. The value of small area estimates has been demonstrated previously in a variety of other contexts, and again here offers important insights for intervention design and resource allocation at different micro-levels within small and large areas (i.e., district, school, and class). This work adds to the growing body of research that supports community-driven school-based lifestyle interventions targeting early-onset chronic disease and, more specifically, enhances the geographic resolution with which adolescent obesity can be addressed in middle school populations across Tennessee. obesity Tennessee Youth Risk Behavior Survey YRBS logistic regression Biostatistics and Epidemiology Biostatistics Community Health and Preventive Medicine
215	Semantic frame based automatic extraction of typological information from descriptive grammars Aslam, Irfan January 2019 (has links) This thesis project addresses the machine learning (ML) modelling aspects of the problem of automatically extracting typological linguistic information of natural languages spoken in South Asia from annotated descriptive grammars. Without getting stuck into the theory and methods of Natural Language Processing (NLP), the focus has been to develop and test a machine learning (ML) model dedicated to the information extraction part. Starting with the existing state-of-the-art frameworks to get labelled training data through the structured representation of the descriptive grammars, the problem has been modelled as a supervised ML classification task where the annotated text is provided as input and the objective is to classify the input to one of the pre-learned labels. The approach has been to systematically explore the data to develop understanding of the problem domain and then evaluate a set of four potential ML algorithms using predetermined performance metrics namely: accuracy, recall, precision and f-score. It turned out that the problem splits up into two independent classification tasks: binary classification task and multiclass classification task. The four selected algorithms: Decision Trees, Naïve Bayes, Support VectorMachines, and Logistic Regression belonging to both linear and non-linear families ofML models are independently trained and compared for both classification tasks. Using stratified 10-fold cross validation performance metrics are measured and the candidate algorithms are compared. Logistic Regression provided overall best results with DecisionTree as the close follow up. Finally, the Logistic Regression model was selected for further fine tuning and used in a web demo for typological information extraction tool developed to show the usability of the ML model in the field. Automatic Information Extraction Spoken Languages Typological Linguistic Information Logistic Regression Classification Computer Sciences Datavetenskap (datalogi)
216	Identifcation of serum biomarkers in patients of exfoliative glaucoma in Scandanavian population. : Autoimmune profiling by microarray technology. / Biomarker detection in exfoliative glaucoma. : none Khan, Sabeen Asad January 2019 (has links) Glaucoma is the leading cause of irreversible blindness, estimated to affect more than 79 million people by the year 2020. It is a group of optic neuropathies that is found to be associated with autoimmunity. One of its types is exfoliative glaucoma which is more prevalent in certain areas of the world including Scandinavia. It is more aggressive and often resistant to conventional therapy. The best treatment options for glaucoma lies at the early detection of the disease. The aim of the study was to identify serum biomarkers in patients of exfoliative glaucoma in the Scandinavian population. Serum samples of 30 patients of exfoliative glaucoma and 10 control subjects were profiled on epoxy coated protein microarrays expressing immobilized His-tagged human antigens. 3072 antigens were selected after a literature review which included the ones expressed in eye and retina. Protein-microarrays were incubated with sera, and occurring immunoreactivities were visualized with fluorescence labelled secondary antibodies. To detect changes, spot intensities were digitized and analysed with different statistical methods. Binary logistic regression was used to classify diseased and controls. A significant increase of antibodies against IRAK4 antigen was detected among serum samples of the controls (p = 0.002) as compared to the exfoliative glaucoma patients. Antibodies against four other antigens were found to be more prevalent in serum samples of exfoliative glaucoma patients although not significantly. These included FUT2, VAV2, and GPATCH8 and PFKFB1. The logistic regression was able to classify diseased and controls with 100 percent accuracy depending on 11 selected reactive antigens. Out of the 3072 antigens, IRAK4 was found to be the only significant antigen with increased reactivity in controls as compared to exfoliative glaucoma patients. IRAK4 has a role in innate immunity and signal transduction, antibodies against it may have a neuroprotective effect in glaucoma. However, this is an initial exploratory study based on only 40 samples and further experiments with a larger sample size needs to be performed. / None Glaucoma Exfoliative Microarray Autoantibodies Logistic regression Medical and Health Sciences Medicin och hälsovetenskap
217	A logistic regression analysis of score sending and college matching among high school students Oates, Krystle S. 01 December 2015 (has links) College decisions are often the result of a variety of influences related to student background characteristics, academic characteristics, college preferences and college aspirations. College counselors recommend that students choose a variety of schools, especially schools where the general student body matches the academic achievement of students. These types of schools are generally referred to as match schools. This thesis examined the initial college decisions of high school students in a large Midwestern state, who were an academic match for selective and highly selective schools by observing the student characteristics that were most influential in predicting college matching for students’ initial first choice institution. This thesis also observed college enrollment among students who chose a match school as their first choice institution, college matching over a time period from 1992 to 2013, and college matching after the implementation of a state initiative designed to help students apply for college. Logistic regression along with descriptive statistics were used as the primary analyses for college matching. Results from these analyses showed that students belonging to underrepresented minority groups had odds of college matching for their first choice institution that were significantly greater than white students. Students whose parents earned at least a bachelor’s degree had odds that were significantly greater than students whose parents had not earned a bachelor’s degree. Also, students whose coursework included calculus and physics, and students who planned to earn a graduate degree had significantly greater odds of matching on their first choice institution than students who were not a part of these respective groups. Among students in the sample who chose a match school for their first choice institution, students who had at least one parent earn up to a bachelor’s degree were significantly more likely to enroll in a match school. Also, the percentage of students at a single high school who were eligible for free and reduced lunch were negatively associated with the odds of students enrolling in a match school. To observe score sending among students to their first choice institution over time an additional variable, “year” was added to the logistic regression model to compare the years of 2000, 2008 and 2013 to 1992. The results of this logistic regression analysis showed that students’ odds of choosing a match school for their first choice institution were significantly lower in 2008 and 2013 than in 1992. College matching for students who attended high schools serviced by the state initiative were compared using the percentage differences in college matching before and after the implementation of the program. However, results could not be interpreted with certainty due to the small size of the sample. publicabstract Academic Match College choice College match Higher Education Logistic Regression Score Sending Educational Psychology
218	Psycho-Socio-Cultural Risk Factors for Breech Presentation Peterson, Caroline 02 July 2008 (has links) The Breech Baby Study is a mixed methods study which combines qualitative and quantitative inquiry. This study explores psycho-social-cultural risk factors for breech presentation from an evolutionary perspective. The quantitative component of the study uses Florida birth certificate and Medicaid data sets from 1992-2003 to evaluate the influence of ethnicity and socio-economic status on breech presentation. Ethnicity and socio-economic status account for less than two percent of the variance of risk factors for breech presentation. The qualitative study includes 114 mothers of breech and cephalic presentation babies who completed the State Trait Personality Inventory and a socio-demographic survey. Of these, 52 mothers of cephalic presentation babies and 23 mothers of breech presentation also participated in an in-depth interview about formative life experiences and peri-conception through delivery. The primary data analysis found mothers of breech presentation babies exhibit psycho-social-cultural characteristics unlike those found in mothers of cephalic presentation babies. These characteristics include being idealistic, analytical, polished, overextended, and fearful. Mothers of cephalic presentation babies were better equipped to adapt to unexpected situations and to be pragmatic in the face of unresolvable circumstances. Mothers of breech presentation babies were further separated into two categories. One category is achievement focused woman while the other is non-present focused woman. While both sets of breech presentation mothers were idealistic, the achievement focused mothers were more likely to be analytical, polished, and overextended. In contrast, the non-present focused mothers had a history of abuse and were more likely to have an unresolved pregnancy outcome or to be fearful. Breech presentation is interpreted by attachment theory, evolutionary ecological reproductive theory, and developmental plasticity theory as a fetal strategy to adapt to the intra-uterine relationship environment and an attempt to predict the extra-uterine relationship environment. Maternal fetal attachment Evolution Developmental plasticity Logistic regression Personality American Studies Arts and Humanities
219	Organizational Form of Disease Management Programs: A Transaction Cost Analysis Chandaver, Nahush 14 November 2007 (has links) Patient care programs such as wellness, preventive care and specifically disease management programs, which target the chronically ill population, are designed to reduce healthcare costs and improve health, while promoting the efficient use of healthcare resources, and increasing productivity. The organizational form adopted by the health plan for these programs, i.e. in-sourced vs. outsourced is an important factor in the success of these programs and the extent to which the core objectives listed above are fulfilled. Transaction cost economics aims to explain the working arrangement for an organization and to explain why sourcing decisions were made by considering alternate organizational arrangements and comparing the costs of transacting under each. This research aims to understand the nature and sources of transaction costs, how they affect the sourcing decision of disease management and other programs, and its effect on the organization, using current industry data. Predictive models are used to obtain empirical results of the influence of each factor, and also to provide cost estimates for each organizational form available, irrespective of the form currently adopted. The analysis of the primary data obtained by the means of a web-based survey supports and confirms the effect of transaction cost factors on these programs. This implies that in order to reap financial rewards and serve patients better, health plans must aim to minimize transaction costs and select the organizational form that best accomplishes this objective. Predictive modelling Logistic regression Selection bias Inverse mills ratio Outsourcing Integration American Studies Arts and Humanities
220	INFERENCE USING BHATTACHARYYA DISTANCE TO MODEL INTERACTION EFFECTS WHEN THE NUMBER OF PREDICTORS FAR EXCEEDS THE SAMPLE SIZE Janse, Sarah A. 01 January 2017 (has links) In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that aims to find a set of statistically optimal models via a stochastic search algorithm. Although FSA has shown promise, its current limits include that the user must choose the number of times to run the algorithm. Here, statistical guidance is provided for this number iterations by deriving a lower bound on the probability of obtaining the statistically optimal model in a number of iterations of FSA. Moreover, logistic regression is severely limited when two predictors can perfectly separate the two outcomes. In the case of small sample sizes, this occurs quite often by chance, especially in the case of a large number of predictors. Bhattacharyya distance is proposed as an alternative method to address this limitation. However, little is known about the theoretical properties or distribution of B-distance. Thus, properties and the distribution of this distance measure are derived here. A hypothesis test and confidence interval are developed and tested on both simulated and real data. Bhattacharyya Distance model selection Feasible Solutions Algorithm perfect separation interaction effects logistic regression Statistical Methodology

Search results