1 |
a Bayesian test of independence of two categorical variables obtianed from a small area : an application to BMD and BMIzhou, jingran 19 December 2011 (has links)
"Scientists usually need to understand the extent of the association of two attributes, and the data are typically presented in two-way categorical tables. In science, the chi-squared test is routinely used to analyze data from such tables. However, in many applications the chi-squared test can be defective. For example, when the sample size is small, the chi-squared test may not be applicable. The terms small area" and local area" are commonly used to denote a small geographical area, such as a county. If a survey has been carried out, the sample size within any particular small area may be too small to generate accurate estimates from the data, and a chi-squared test may be invalid (i.e., expected frequencies in some cells of the table are less than ?ve). To deal with this problem we use Bayesian small area estimation. Because it is used toorrow strength" from related or similar areas. It enhances the information of each area with common exchangeable information. We use a Bayesian model to estimate a Bayes factor to test the independence of the two variables. We apply the model to test for the independence between bone mineral density (BMD) and body mass index (BMI) from 31 counties and we compare the results with a direct Bayes factor test. We have also obtained numerical and sampling errors; both the numerical and sampling errors of our Bayes factor are small. Our model is shown to be much less sensitive to the speci?cation of the prior distribution than the direct Bayes factor test which is based on each area only."
|
2 |
The Effect of Maternal and Fetal Inbreeding on Dystocia, Calf Survival, Days to First Service and Non-Return Performance in U.S. Dairy CattleAdamec, Vaclav 17 January 2002 (has links)
Intensive selection for increased milk production over many generations has led to growing genetic similarity and increased relationships in dairy population. In the current study, inbreeding depression was estimated for number of days to first service, summit milk, conception by 70 days non-return, and calving rate with a linear mixed model (LMM) approach and for calving difficulty, calf mortality with a Bayesian threshold model (BTM) for categorical traits. Effectiveness of classical and unknown parentage group procedures to estimate inbreeding coefficients was evaluated depending on completeness of a 5-generation pedigree. A novel method derived from the classical formula to estimate inbreeding was utilized to evaluate completeness of pedigrees. Two different estimates of maternal inbreeding were fitted in separate models as a linear covariate in combined LMM analyses (Holstein registered and grade cows and Jersey cows) or separate analyses (registered Holstein cows) by parity (1-4) with fetal inbreeding. Impact of inbreeding type, model, data structure, and treatment of herd-year-season (HYS) on magnitude and size of inbreeding depression were assessed. Grade Holstein datasets were sampled and analyzed by percentage of pedigree present (0-30%, 30-70% and 70-100%). BTM analyses (sire-mgs) were performed using Gibbs sampling for parities 1, 2 and 3 fitting maternal inbreeding only. In LMM analyses of grade data, the least pedigree and diagonal A matrix performed the worst. Significant inbreeding effects were obtained in most traits in cows of parity 1. Fetal inbreeding depression was mostly lower than that from maternal inbreeding. Inbreeding depression in binary traits was the most difficult to evaluate. Analyses with non-additive effects included in LMM, for data by inbreeding level and by age group should be preferred to estimate inbreeding depression. In BTM inbreeding effects were strongly related to dam parity and calf sex. Largest effects were obtained from parity 1 cows giving birth to male calves (0.417% and 0.252% for dystocia and calf mortality) and then births to female calves (0.300% and 0.203% for dystocia and calf mortality). Female calves from mature cows were the least affected (0.131% and 0.005% for dystocia and calf mortality). Data structure was found to be a very important factor to attainment of convergence in distribution. / Ph. D.
|
3 |
The Effectiveness of Categorical Variables in Discriminant Function AnalysisWaite, Preston Jay 01 May 1971 (has links)
A preliminary study of the feasibility of using categorical variables in discriminant function analysis was performed. Data including both continuous and categorical variables were used and predictive results examined.
The discriminant function techniques were found to be robust enough to include the use of categorical variables.
Some problems were encountered with using the trace criterion for selecting the most discriminating variables when these variables are categorical. No monotonic relationship was found to exist between the trace and the number of correct predictions.
This study did show that the use of categorical variables does have much potential as a statistical tool in classification procedures. (50 pages)
|
4 |
Analýza spotřebitelských úvěrů pomocí statistických metod / The consumer loans analysis using statistical methodsBožíková, Barbora January 2016 (has links)
Consumer loans are part of loan products provided by bank institutions. This diploma thesis is focused on possibility of identifying risk clients with the consumer loans, using available data set. In the first part of the work was briefly mentioned the credit process and also theoretical basis of statistic methods used in empirical part of the work. In the second part were investigated dependencies, and was described the clients structure. Then the discriminant analysis was applied, with the aim to identify the sorting criteria, which could recognize the risk and unproblematic clients. Subsequently the results of the analysis were evaluated and described the identified connections.
|
5 |
Modelos de regressão para variáveis categóricas ordinais com aplicações ao problema de classificação / Regression models for ordinal categorical variables with applications to the classification problemOkura, Roberta Irie Sumi 11 April 2008 (has links)
Neste trabalho, apresentamos algumas metodologias para analisar dados que possuem variável resposta categórica ordinal. Descrevemos os principais Modelos de Regressão conhecidos atualmente que consideram a ordenação das categorias de resposta, entre eles: Modelos Cumulativos e Modelos Sequenciais. Discutimos também o problema de discriminação e classificação de elementos em grupos ordinais, comentando sobre os preditores mais comuns para dados desse tipo. Apresentamos ainda a técnica de Análise Discriminante Ótima e sua versão aprimorada, baseada na utilização de métodos bootstrap. Por fim, aplicamos algumas das técnicas descritas a dados reais da área financeira, com o intuito de classificar possíveis clientes, no momento da aquisição de um cartão de crédito, como futuros bons, médios ou maus pagadores. Para essa aplicação, discutimos as vantagens e desvantagens dos modelos utilizados em termos de qualidade da classificação. / In this work, some methods to analyse data with ordinal categorical response are presented. We describe the most important and widely used Regression Models which consider the ordering of response categories like: Cumulative Models and Sequential Models. We also discuss the problem of how to discriminate and classify elements in ordinal groups, commenting on the most common predictors to this kind of data. Also we present the technique known as optimal discriminant analysis and its improved version, based on the use of bootstrap methods. Finally, we apply some of the described techniques to real financial data, intending to classify possible consumers, on acquistion of a credit card, as high, medium and low risk customers. With this application, we discuss the advantages and disadvantages of the models used in terms of quality of classification.
|
6 |
Modelos de regressão para variáveis categóricas ordinais com aplicações ao problema de classificação / Regression models for ordinal categorical variables with applications to the classification problemRoberta Irie Sumi Okura 11 April 2008 (has links)
Neste trabalho, apresentamos algumas metodologias para analisar dados que possuem variável resposta categórica ordinal. Descrevemos os principais Modelos de Regressão conhecidos atualmente que consideram a ordenação das categorias de resposta, entre eles: Modelos Cumulativos e Modelos Sequenciais. Discutimos também o problema de discriminação e classificação de elementos em grupos ordinais, comentando sobre os preditores mais comuns para dados desse tipo. Apresentamos ainda a técnica de Análise Discriminante Ótima e sua versão aprimorada, baseada na utilização de métodos bootstrap. Por fim, aplicamos algumas das técnicas descritas a dados reais da área financeira, com o intuito de classificar possíveis clientes, no momento da aquisição de um cartão de crédito, como futuros bons, médios ou maus pagadores. Para essa aplicação, discutimos as vantagens e desvantagens dos modelos utilizados em termos de qualidade da classificação. / In this work, some methods to analyse data with ordinal categorical response are presented. We describe the most important and widely used Regression Models which consider the ordering of response categories like: Cumulative Models and Sequential Models. We also discuss the problem of how to discriminate and classify elements in ordinal groups, commenting on the most common predictors to this kind of data. Also we present the technique known as optimal discriminant analysis and its improved version, based on the use of bootstrap methods. Finally, we apply some of the described techniques to real financial data, intending to classify possible consumers, on acquistion of a credit card, as high, medium and low risk customers. With this application, we discuss the advantages and disadvantages of the models used in terms of quality of classification.
|
7 |
GDPR ́s Impact on Sales at Flygresor.se: A Regression Analysis / GDPRs påverkan på försäljning hos Flygresor.se: en regressionsanalysLansryd, Lisette, Engvall Birr, Madeleine January 2019 (has links)
The possible effects of the General Data Protections Regulations (GDPR) have been widely discussed among policymakers, stakeholders and ordinary people who are the objective for data collection. The purpose of GDPR is to protect people’s integrity and increase transparency for how personal data is used. Up until May 25th, 2018 personal data could be sampled and used without consent from users. Many argue that the introduction of GDPR is good, others are reluctant and argue that GDPR may harm data-driven companies. The report aims to answer how GDPR affects sales at the flight search engine Flygresor.se. By examining how and to what extent these regulations impact revenue, it is hoped for that these findings will lead to a deeper understanding of how these regulations affect businesses. Multiple linear regression analysis was used as the framework to answer the research question. Numerous models were constructed based on data provided by Flygresor.se. The models mostly included categorical variables representing time indicators such as month, weekday, etc. After carefully performing data modifications, variable selections and model evaluation tests three final models were obtained. After performing statistical inference tests and multicollinearity diagnostics on the models it could be concluded that an effect from GDPR could not be statistically proven. However, this does not mean that an actual effect of GDPR did not occur, only that it could not be isolated and proven. Thus, the extent of the effect of GDPR is statistically inconclusive. / De möjliga följderna av införandet av General Data Protections Regulations (GDPR) har varit väl omdiskuterat bland beslutsfattare, intressenter och människor som är målet för datainsamlingen. Syftet med GDPR är att skydda människors integritet samt öka insynen för hur personlig data används. Fram tills den 25 maj 2018 har det varit möjligt att samla in och använda personuppgifter utan samtyckte från användare. Många menar att införandet av GDPR är nödvändigt medans andra är mer kritiska och menar att GDPR kan skada lönsamheten för data beroende verksamheter. Denna rapport syftar till att svara på huruvida GDPR har påverkat försäljningen på flygsökmotorn Flygresor.se. Genom att undersöka om och i vilken utsträckning dessa regler påverkat intäkterna, är förhoppningen att dessa resultat kan leda till en djupare förståelse för hur GDPR påverkar företag. Multipel linjär regressionsanalys användes som ramverk för att svara på frågeställningen. Flera modeller utformades baserat på data som tillhandahölls av Flygresor.se. Modellerna var främst baserade på kategoriska variabler som representerade tidsaspekter så som månad, veckodag etc. Efter ett grundligt genomförande av data modifieringar, variabelselektion och modellutvärdering kunde tre modeller konstateras. Efter att ha genomfört signifikanstester och korrelationstester på modellerna kunde det fastställas att en effekt från GDPR inte kunde statistiskt säkerställas. Dock betyder detta inte att GDPR inte har haft en faktisk effekt, utan att en effekt inte kunde isoleras och bevisas.
|
8 |
Geostatistical three-dimensional modeling of the subsurface unconsolidated materials in the Göttingen area / The transitional-probability Markov chain versus traditional indicator methods for modeling the geotechnical categories in a test site.Ranjineh Khojasteh, Enayatollah 27 June 2013 (has links)
Das Ziel der vorliegenden Arbeit war die Erstellung eines dreidimensionalen Untergrundmodells der Region Göttingen basierend auf einer geotechnischen Klassifikation der unkosolidierten Sedimente. Die untersuchten Materialen reichen von Lockersedimenten bis hin zu Festgesteinen, werden jedoch in der vorliegenden Arbeit als Boden, Bodenklassen bzw. Bodenkategorien bezeichnet.
Diese Studie evaluiert verschiedene Möglichkeiten durch geostatistische Methoden und Simulationen heterogene Untergründe zu erfassen. Derartige Modellierungen stellen ein fundamentales Hilfswerkzeug u.a. in der Geotechnik, im Bergbau, der Ölprospektion sowie in der Hydrogeologie dar.
Eine detaillierte Modellierung der benötigten kontinuierlichen Parameter wie z. B. der Porosität, der Permeabilität oder hydraulischen Leitfähigkeit des Untergrundes setzt eine exakte Bestimmung der Grenzen von Fazies- und Bodenkategorien voraus. Der Fokus dieser Arbeit liegt auf der dreidimensionalen Modellierung von Lockergesteinen und deren Klassifikation basierend auf entsprechend geostatistisch ermittelten Kennwerten. Als Methoden wurden konventionelle, pixelbasierende sowie übergangswahrscheinlichkeitsbasierende Markov-Ketten Modelle verwendet.
Nach einer generellen statistischen Auswertung der Parameter wird das Vorhandensein bzw. Fehlen einer Bodenkategorie entlang der Bohrlöcher durch Indikatorparameter beschrieben. Der Indikator einer Kategorie eines Probepunkts ist eins wenn die Kategorie vorhanden ist bzw. null wenn sie nicht vorhanden ist. Zwischenstadien können ebenfalls definiert werden. Beispielsweise wird ein Wert von 0.5 definiert falls zwei Kategorien vorhanden sind, der genauen Anteil jedoch nicht näher bekannt ist. Um die stationären Eigenschaften der Indikatorvariablen zu verbessern, werden die initialen Koordinaten in ein neues System, proportional zur Ober- bzw. Unterseite der entsprechenden Modellschicht, transformiert. Im neuen Koordinatenraum werden die entsprechenden Indikatorvariogramme für jede Kategorie für verschiedene Raumrichtungen berechnet. Semi-Variogramme werden in dieser Arbeit, zur besseren Übersicht, ebenfalls als Variogramme bezeichnet.
IV
Durch ein Indikatorkriging wird die Wahrscheinlichkeit jeder Kategorie an einem Modellknoten berechnet. Basierend auf den berechneten Wahrscheinlichkeiten für die Existenz einer Modellkategorie im vorherigen Schritt wird die wahrscheinlichste Kategorie dem Knoten zugeordnet. Die verwendeten Indikator-Variogramm Modelle und Indikatorkriging Parameter wurden validiert und optimiert. Die Reduktion der Modellknoten und die Auswirkung auf die Präzision des Modells wurden ebenfalls untersucht. Um kleinskalige Variationen der Kategorien auflösen zu können, wurden die entwickelten Methoden angewendet und verglichen. Als Simulationsmethoden wurden "Sequential Indicator Simulation" (SISIM) und der "Transition Probability Markov Chain" (TP/MC) verwendet. Die durchgeführten Studien zeigen, dass die TP/MC Methode generell gute Ergebnisse liefert, insbesondere im Vergleich zur SISIM Methode. Vergleichend werden alternative Methoden für ähnlichen Fragestellungen evaluiert und deren Ineffizienz aufgezeigt.
Eine Verbesserung der TP/MC Methoden wird ebenfalls beschrieben und mit Ergebnissen belegt, sowie weitere Vorschläge zur Modifikation der Methoden gegeben. Basierend auf den Ergebnissen wird zur Anwendung der Methode für ähnliche Fragestellungen geraten. Hierfür werden Simulationsauswahl, Tests und Bewertungsysteme vorgeschlagen sowie weitere Studienschwerpunkte beleuchtet.
Eine computergestützte Nutzung des Verfahrens, die alle Simulationsschritte umfasst, könnte zukünftig entwickelt werden um die Effizienz zu erhöhen.
Die Ergebnisse dieser Studie und nachfolgende Untersuchungen könnten für eine Vielzahl von Fragestellungen im Bergbau, der Erdölindustrie, Geotechnik und Hydrogeologie von Bedeutung sein.
|
Page generated in 0.0741 seconds