Spelling suggestions: "subject:"multinomial regression"" "subject:"ultinomial regression""
1 |
Co-relation of Variables Involved in the Occurrence of Crane Accidents in U.S. through Logit Modeling.Bains, Amrit Anoop Singh 2010 August 1900 (has links)
One of the primary reasons of the escalating rates of injuries and fatalities in the construction industry is the ever so complex, dynamic and continually changing nature of construction work. Use of cranes has become imperative to overcome technical challenges, which has lead to escalation of danger on a construction site. Data from OSHA show that crane accidents have increased rapidly from 2000 to 2004. By analyzing the characteristics of all the crane accident inspections, we can better understand the significance of the many variables involved in a crane accident.
For this research, data were collected from the U.S. Department of Labor website via the OSHA database. The data encompass crane accident inspections for all the states. The data were divided into categories with respect to accident types, construction operations, degree of accident, fault, contributing factors, crane types, victim’s occupation, organs affected and load. Descriptive analysis was performed to compliment the previous studies, the only difference being that both fatal and non-fatal accidents have been considered.
Multinomial regression has been applied to derive probability models and correlation between different accident types and the factors involved for each crane accident type. A log likelihood test as well as chi-square test was performed to validate the models. The results show that electrocution, crane tip over and crushed during assembly/disassembly have more probability of occurrence than other accident types. Load is not a significant factor for the crane accidents, and manual fault is more probable a cause for crane accident than is technical fault. Construction operations identified in the research were found to be significant for all the crane accident types. Mobile crawler crane, mobile truck crane and tower crane were found to be more susceptible. These probability models are limited as far as the inculcation of unforeseen variables in construction accidents are concerned. In fact, these models utilize the past to portray the future, and therefore significant change in the variables involved is required to be added to attain correct and expedient results.
|
2 |
Finns det blockpolitiska skillnader i kommunal skattepolitik?Hägerdal, Erik, Sjögren, Joakim January 2011 (has links)
Med denna uppsats har vi försökt modellera sannolikheten för förändringar i den kommunala skattenivån i svenska kommuner givet deras politiska styre, som är kodade enligt definition av Sveriges Kommuner och Landsting (SKL). Genom att göra det försöker vi få svar på frågan om det finns blockpolitiska skillnader i kommunal skattepolitik. Genom att kombinera SKL:s klassificering av kommunala politiska styren 2002-2006 med 2006-2010 har vi skapat så kallade maktskifteskategorier, som vi sedan använder som kategoriska variabler när vi modellerar sannolikheten för olika förändringar i de kommunala skattenivåerna. För att i viss mån renodla politikens inflytande och skilja det från andra strukturer som kan tänkas påverka skattenivån har vi skapat variabler som vi kallar för kontrollvariabler. Dessa försöker spegla strukturella förutsättningar för den kommunala ekonomin. Två av dessa är förändring i befolkningstäthet samt andelen av den kommunala totalbefolkningen som är förvärvsarbetande. För att ge läsaren en beskrivande bild av vårt datamaterial har vi i den deskriptiva delen av uppsatsen använt oss av så kallade markovkedjor för att ge en initial bild av sannolikheten för höjd, sänkt respektive oförändrad skattenivå givet förändringar i det politiska styret före och efter valet 2006. Sannolikheterna för de tre möjliga utfallen höjd, sänkt respektive oförändrad skatt modelleras med två multinominala logistiska regressioner – först utan kontrollvariabler och sedan med. För att tydligare kunna se politikens påverkan har vi valt att utesluta 30 kommuner enligt vissa avgränsningskriterier. Vi har uteslutit kommuner som haft ”hoppande” majoriteter mellan valen. I slutet av resultatdelen provar vi även att estimera en multinominal logistisk modell där vi uteslutit minoritetsstyren. Inom vissa politiska maktskifteskategorier hade vi för få observationer för att kunna modelera sannolikheten för sänkt skatt; data saknas data för att kunna göra en stabil modell med signifikanta parametrar. Efter estimationen av våra modeller jämför vi de koefficienter som blivit signifikanta på 5%-nivån, och jämföra koefficientestimatens värde för att rangordna maktskiften efter vilken påverkan de har på sannolikheten för höjd skatt. Vi utvärderar sen vårt resultat och vilka faktorer och egenskaper i den kommunala skattepolitiken under 2006-2010 som kan ha påverkat det. Vårt resultat visar att flera typer av maktskiften har en signifikant inverkan på sannolikhetsfördelningen för höjd skatt, och samtidigt som vi inte kan dra några slutsatser om sannolikheten för sänkt skatt hjälper estimationerna ändå oss att analysera blockpolitikens påverkan på den kommunala skattepolitiken.
|
3 |
Predições estatísticas para dados politômicos / Statistical predictions for polytomous dataRequena, Guaraci de Lima 17 August 2018 (has links)
Este trabalho generaliza a partição da distribuição de Bernoulli multivariada em distribuições de Bernoulli e como esta partição leva a um modelo de regressão e a um classificador para dados politômicos. Como ponto de partida, desejamos explicitar a função de ligação para os modelos de regressão multinomial e escrevê-la a partir de funções de distribuição, como feito no caso binomial, a fim de flexibilizá-la para além da logito usual. Para isso, estudamos as fatorações da Bernoulli multivariada em Bernoullis, bem como a multinomial em binomiais, a fim de explicitar como as funções de distribuição podem desempenhar um papel na ligação entre o espaço das covariáveis e o vetor de probabilidades. Basu & Pereira (1982) exploram tais fatorações em um problema de não resposta e Pereira & Stern (2008) as generalizam para uma classe de fatorações. Este trabalho propõe uma simplificação tanto da regressão multinomial - agregando a flexibilidade do caso binomial -, quanto da classificação politômica, no sentido de decompor o problema politômico em dicotômicos através da generalização da classe de fatorações. Um problema computacional surge pois tal classe pode ter um número muito grande de elementos distintos de acordo com o número de categorias e, assim, duas propostas são feitas para buscar uma que minimiza os riscos de classificação binomial envolvidos, passo-a-passo. A motivação para este trabalho é apresentada a fim de se estudar as performances de tais modelos de regressão e classificadores. Partimos de um problema da área médica, mais especificamente em transtorno obsessivo-compulsivo, em que desejamos classificar um indivíduo a fim de obter um fenótipo mais puro de tal transtorno e de modelá-lo a fim de buscar as covariáveis que estão relacionadas com tal fenótipo, a partir de um conjunto de dados reais. / This work explores a partition of the multivariate Bernoulli distribution in Bernoulli distributions and how this partition leads to a regression model and to a classifier for polytomous data. As starting point, we want to make explicit the link function for multinomial regression models and write it from distribution functions, as in the binomial case, in order to flexibilize it beyond the usual logit. For that, we study the factorizations of the multivariate Bernoulli in Bernoullis, as well as the multinomial in binomials, in order to make explicit as the distribution functions may play a role in the linkage between the space of covariates and the vector of probabilities. Basu and Pereira (1982) explore these factorizations in a nonresponse problem and Pereira and Stern (2008) generalize them to a class of factorizations. Thus, this work proposes a simplification of the multinomial regression - adding the flexibility from the binomial case -, and of the polytomous classification, decomposing de polytomous problem in dichotomous through the generalization of the class of factorizations. At this point, a computational problem arises because the amount of factorizations may be very large according to the number of categories and then we propose two approaches to seek a factorization that minimize the involved binomial classification risks, step-by-step. The motivation for this work is presented in order to study the performance of such regression models and classifiers. We start from a medical problem, more precisely in obsessive-compulsive disorder, in which we want to classify a patient in order to get a more pure phenotype of such disorder and model it in order to seek the related covariates, from a real dataset.
|
4 |
Predições estatísticas para dados politômicos / Statistical predictions for polytomous dataGuaraci de Lima Requena 17 August 2018 (has links)
Este trabalho generaliza a partição da distribuição de Bernoulli multivariada em distribuições de Bernoulli e como esta partição leva a um modelo de regressão e a um classificador para dados politômicos. Como ponto de partida, desejamos explicitar a função de ligação para os modelos de regressão multinomial e escrevê-la a partir de funções de distribuição, como feito no caso binomial, a fim de flexibilizá-la para além da logito usual. Para isso, estudamos as fatorações da Bernoulli multivariada em Bernoullis, bem como a multinomial em binomiais, a fim de explicitar como as funções de distribuição podem desempenhar um papel na ligação entre o espaço das covariáveis e o vetor de probabilidades. Basu & Pereira (1982) exploram tais fatorações em um problema de não resposta e Pereira & Stern (2008) as generalizam para uma classe de fatorações. Este trabalho propõe uma simplificação tanto da regressão multinomial - agregando a flexibilidade do caso binomial -, quanto da classificação politômica, no sentido de decompor o problema politômico em dicotômicos através da generalização da classe de fatorações. Um problema computacional surge pois tal classe pode ter um número muito grande de elementos distintos de acordo com o número de categorias e, assim, duas propostas são feitas para buscar uma que minimiza os riscos de classificação binomial envolvidos, passo-a-passo. A motivação para este trabalho é apresentada a fim de se estudar as performances de tais modelos de regressão e classificadores. Partimos de um problema da área médica, mais especificamente em transtorno obsessivo-compulsivo, em que desejamos classificar um indivíduo a fim de obter um fenótipo mais puro de tal transtorno e de modelá-lo a fim de buscar as covariáveis que estão relacionadas com tal fenótipo, a partir de um conjunto de dados reais. / This work explores a partition of the multivariate Bernoulli distribution in Bernoulli distributions and how this partition leads to a regression model and to a classifier for polytomous data. As starting point, we want to make explicit the link function for multinomial regression models and write it from distribution functions, as in the binomial case, in order to flexibilize it beyond the usual logit. For that, we study the factorizations of the multivariate Bernoulli in Bernoullis, as well as the multinomial in binomials, in order to make explicit as the distribution functions may play a role in the linkage between the space of covariates and the vector of probabilities. Basu and Pereira (1982) explore these factorizations in a nonresponse problem and Pereira and Stern (2008) generalize them to a class of factorizations. Thus, this work proposes a simplification of the multinomial regression - adding the flexibility from the binomial case -, and of the polytomous classification, decomposing de polytomous problem in dichotomous through the generalization of the class of factorizations. At this point, a computational problem arises because the amount of factorizations may be very large according to the number of categories and then we propose two approaches to seek a factorization that minimize the involved binomial classification risks, step-by-step. The motivation for this work is presented in order to study the performance of such regression models and classifiers. We start from a medical problem, more precisely in obsessive-compulsive disorder, in which we want to classify a patient in order to get a more pure phenotype of such disorder and model it in order to seek the related covariates, from a real dataset.
|
5 |
Investigation into methods of predicting income from credit card holders using panel dataOsipenko, Denys January 2018 (has links)
A credit card as a banking product has a dual nature both as a convenient loan and a payment tool. Credit card profitability prediction is a complex problem because of the variety of the card holders' behaviour patterns, a fluctuating balance, and different sources of interest and transactional income. The state of a credit card account depends on the type of card usage and payments delinquency, and can be defined as inactive, transactor, revolver, delinquent, and default. The proposed credit cards profit prediction model consists of four stages: i) utilisation rate and interest rate income prediction, ii) non-interest rate income prediction, iii) account state prediction with conditional transition probabilities, and iv) the aggregation of the partial models into total income estimation. This thesis describes an approach to credit card account-level profitability prediction based on multistate and multistage conditional probabilities models with different types of income and compares methods for the most accurate predictions. We use application, behavioural, card state, and macroeconomic characteristics as predictors. This thesis contains nine chapters: Introduction, Literature Review, six chapters giving descriptions of the data, methodologies and discussions of the results of the empirical investigation, and Conclusion. Introduction gives the key points and main aims of the current research and describes the general schema of the total income prediction model. Literature Review proposes a systematic analysis of academic work on loan profit modelling and highlights the gaps in the application of profit scoring to credit cards income prediction. Chapter 3 describes the data sample and gives the overview of characteristics. Chapter 4 is dedicated to the prediction of the credit limit utilisation and contains the comparative analysis of the predictive accuracy of different regression models. We apply five methods such as i) linear regression, ii) fractional regression, iii) beta-regression, iv) beta-transformation, and v) weighted logistic regression with data binary transformation for utilisation rate prediction for one- and two-stage models. Chapters 5 and 6 are dedicated to modelling the transition probabilities between credit card states. Chapter 5 describes the general model setups, model building methodology such as transition probability prediction with conditional binary logistic, ordinal, and multinomial regressions, the data sample description, the univariate analysis of predictors. Chapter 6 discusses regression estimation results for all types of regression and a comparative analysis of the models. Chapter 7 describes an approach to the non-interest rate income prediction and contains a comparative analysis of panel data regression techniques such as pooled and four random effect methods. We consider two sources of non-interest income generation: i) interchange fees and foreign exchange fees from transactions via pointof- sales (POS) and ii) ATM fees from cash withdrawals. We compare the predictive accuracy of a one-stage approach, which means the usage of a single linear model for the income amount estimation, and a two-stage approach, which means that the income amount conditional on the probability of POS and ATM transaction. Chapter 8 aggregates the results from the partial models into a single model for total income estimation. We assume that a credit card account does not have a single particular state and a single behavioural type in the future, but has a chance to move to any of possible states. The income prediction model is selected according to these states, and the transition probabilities are used as weights for the particular interest rate and non-interest rate income prediction models. Conclusion highlights the contributions of this research. We propose an innovative methodological approach for credit card income prediction as a system of models, which considers the estimation of the income from different sources and then aggregates the income estimations weighted by the states transition probabilities. The results of comparative analysis of regression methods for: i) utilization rate of credit limit and ii) non-interest income prediction, iii) the use of panel data with pooled and random effect for profit scoring, and iv) account level non-binary target transition probabilities estimation for credit cards can be used as benchmarks for further research and fill the gaps of empirical investigations in the literature. The estimation of the transition probability between states at the account level helps to avoid the memorylessness property of the Markov Chains approach. We have investigated the significance of predictors for models of this type. The proposed modelling approach can be applied for the development of business strategies such as credit limit management, customer segmentation by the profitability and behavioural type.
|
6 |
Analysis of Prokaryotic Metabolic NetworksUrquhart, Caroline 30 March 2011 (has links)
Establishing group structure in complex networks is potentially very useful since nodes belonging to the same module can often be related by commonalities in their biological function. However, module detection in complex networks poses a challenging problem and has sparked a great deal of interest in various disciplines in recent years [5]. In real networks, which can be quite complex, we have no idea about the true number of modules that exist. Furthermore, the structure of the modules
may be hierarchical meaning they may be further divided into sub-modules and so forth. Many attempts have been made to deal with these problems and because the involved methods vary considerably they have been difficult to compare [5]. The objectives of this thesis are (i) to create and implement a new algorithm that will
identify modules in complex networks and reconstruct the network in such a way so as to maximize modularity, (ii) to evaluate the performance of a new method, and compare it to a popular method based on a simulated annealing algorithm, and
(iii) to apply the new method, and a comparator method, to analyze the metabolic
network of the bacterial genus Listeria, an important pathogen in both agricultural
and human clinical settings.
|
7 |
Modely s kategoriální odezvou / Models with categorical responseFaltýnková, Anežka January 2015 (has links)
This thesis concentrates on regression models with a categorical response. It focuses on the model of logistic regression with binary response and its generalization in which two models are distinguished: multinomial regression with nominal response and multinomial regression with ordinal response. For all three models separately, the Wald test and the likelihood ratio test are derived. These theoretical derivations are then used to calculate the test statistics for specific examples in statistical software R. The theory described in the thesis is illustrated by examples with small and large number of explanatory variables.
|
8 |
Farm structure and environmental context drive farmers’ decisions on the spatial distribution of ecological focus areas in GermanyAlarcón‑Segura, V., Roilo, S., Paulus, A., Beckmann, M., Klein, N., Cord, A. F. 14 August 2024 (has links)
Context: Ecological Focus Areas (EFAs) were designed as part of the greening strategy of the common agricultural policy to conserve biodiversity in European farmland, prevent soil erosion and improve soil quality. Farmers receive economic support if they dedicate at least 5% of their arable farmland to any type of EFA, which can be selected from a list of options drawn up at the European Union level. However, EFAs have been criticized for failing to achieve their environmental goals and being ineffective in conserving farmland biodiversity, mainly because they are not spatially targeted and because they promote economic rather than ecological considerations in farm management decisions.
Objectives: We used a spatially explicit approach to assess the influence of farm and field context as well as field terrain and soil conditions on the likelihood of whether or not a particular EFA type was implemented in a field.
Methods: We used a multinomial model approach using field-level land use and management data from 879 farms that complied with the EFA policy in 2019 in the Mulde River Basin in Saxony, Germany. Geospatial environmental information was used to assess which predictor variables (related to farm context, field context or field terrain and soil conditions) increased the probability of a field being assigned to a particular EFA. We tested the hypothesis that productive EFAs are more often implemented on fields that are more suitable for agricultural production and that EFA options that are considered more valuable for biodiversity (e.g. non-productive EFAs) are allocated on fields that are less suitable for agricultural production.
Results: We found that farms embedded in landscapes with a low proportion of small woody features or nature conservation areas mainly fulfilled the EFA policy with productive EFAs (e.g. nitrogen fixing crops). Conversely, farms with a higher proportion of small woody features or nature conservation areas were more likely to adopt non-productive EFAs. As predicted, large and compact fields with higher soil fertility and lower erosion risk were assigned to productive EFAs. Non-productive EFAs were placed on small fields in naturally disadvantaged areas. EFA options considered particularly beneficial for biodiversity, such as fallow land, were allocated far away from other semi-natural or nature protection areas. - Conclusions Our results highlight that the lack of spatial targeting of EFAs may result in EFA options being assigned to areas where their relative contribution to conservation goals is lower (e.g. farms with higher shares of protected areas) and absent in areas where they are most needed (e.g. high intensity farms). To ensure that greening policies actually promote biodiversity in European agriculture, incentives are needed to encourage greater uptake of ecologically effective measures on intensively used farms. These should be coupled with additional measures to conserve threatened species with specific habitat requirements.
|
9 |
Automatic map generation from nation-wide data sources using deep learningLundberg, Gustav January 2020 (has links)
The last decade has seen great advances within the field of artificial intelligence. One of the most noteworthy areas is that of deep learning, which is nowadays used in everything from self driving cars to automated cancer screening. During the same time, the amount of spatial data encompassing not only two but three dimensions has also grown and whole cities and countries are being scanned. Combining these two technological advances enables the creation of detailed maps with a multitude of applications, civilian as well as military.This thesis aims at combining two data sources covering most of Sweden; laser data from LiDAR scans and surface model from aerial images, with deep learning to create maps of the terrain. The target is to learn a simplified version of orienteering maps as these are created with high precision by experienced map makers, and are a representation of how easy or hard it would be to traverse a given area on foot. The performance on different types of terrain are measured and it is found that open land and larger bodies of water is identified at a high rate, while trails are hard to recognize.It is further researched how the different densities found in the source data affect the performance of the models, and found that some terrain types, trails for instance, benefit from higher density data, Other features of the terrain, like roads and buildings are predicted with higher accuracy by lower density data.Finally, the certainty of the predictions is discussed and visualised by measuring the average entropy of predictions in an area. These visualisations highlight that although the predictions are far from perfect, the models are more certain about their predictions when they are correct than when they are not.
|
Page generated in 0.3368 seconds