• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 291
  • 113
  • 32
  • 31
  • 15
  • 13
  • 8
  • 7
  • 7
  • 6
  • 5
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 604
  • 604
  • 213
  • 118
  • 101
  • 99
  • 97
  • 82
  • 78
  • 65
  • 62
  • 61
  • 55
  • 53
  • 51
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

A distribuição normal-valor extremo generalizado para a modelagem de dados limitados no intervalo unitá¡rio (0,1) / The normal-generalized extreme value distribution for the modeling of data restricted in the unit interval (0,1)

Benites, Yury Rojas 28 June 2019 (has links)
Neste trabalho é introduzido um novo modelo estatístico para modelar dados limitados no intervalo continuo (0;1). O modelo proposto é construído sob uma transformação de variáveis, onde a variável transformada é resultado da combinação de uma variável com distribuição normal padrão e a função de distribuição acumulada da distribuição valor extremo generalizado. Para o novo modelo são estudadas suas propriedades estruturais. A nova família é estendida para modelos de regressão, onde o modelo é reparametrizado na mediana da variável resposta e este conjuntamente com o parâmetro de dispersão são relacionados com covariáveis através de uma função de ligação. Procedimentos inferênciais são desenvolvidos desde uma perspectiva clássica e bayesiana. A inferência clássica baseia-se na teoria de máxima verossimilhança e a inferência bayesiana no método de Monte Carlo via cadeias de Markov. Além disso estudos de simulação foram realizados para avaliar o desempenho das estimativas clássicas e bayesianas dos parâmetros do modelo. Finalmente um conjunto de dados de câncer colorretal é considerado para mostrar a aplicabilidade do modelo. / In this research a new statistical model is introduced to model data restricted in the continuous interval (0;1). The proposed model is constructed under a transformation of variables, in which the transformed variable is the result of the combination of a variable with standard normal distribution and the cumulative distribution function of the generalized extreme value distribution. For the new model its structural properties are studied. The new family is extended to regression models, in which the model is reparametrized in the median of the response variable and together with the dispersion parameter are related to covariables through a link function. Inferential procedures are developed from a classical and Bayesian perspective. The classical inference is based on the theory of maximum likelihood, and the Bayesian inference is based on the Markov chain Monte Carlo method. In addition, simulation studies were performed to evaluate the performance of the classical and Bayesian estimates of the model parameters. Finally a set of colorectal cancer data is considered to show the applicability of the model
62

Técnicas de diagnósticos em modelos espaciais lineares gaussianos / DIagnostics techniques in spatial linear gaussians models

Borssoi, Joelmir André 04 December 2007 (has links)
Made available in DSpace on 2017-07-10T19:24:25Z (GMT). No. of bitstreams: 1 JOELMIR ANDRE BORSSOI.pdf: 1897222 bytes, checksum: 4bfaafc0659eed32ceef8dc0fe90a8fe (MD5) Previous issue date: 2007-12-04 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Tracking and management concepts of the process of agricultural production are being used as a great option of strategy management in agriculture. Such concepts consider the spatial variability of the variables at study. The modeling of the spatial dependence structure of the geoestatistic approach is fundamental importance for the definition parameters that define this structure and are used in the interpolation of values in places not sampled, by kriging techniques. However, the estimation of parameters can be greatly affected by the presence atypical observations in the data sampled. The development of this work was aimed at using diagnostics techniques in spatial linear gaussians models, used in geoestatistics, to evaluate the sensitivity of the maximum likelihood estimators and restrict maximum likelihood to small perturbations in the data. Studies were performed with simulated data, with literature data and with experimental data, collected in a commercial agricultural area in the region West of Paraná. The study with simulated data showed that the techniques used in diagnostics were efficient in identifying the perturbation data. The restrict maximum likelihood estimator produced more robust estimates for the parameters spatial dependence. Those results obtained from the study of real data, it was concluded that the presence atypical values between the sampled data can exert strong influence on thematic maps, changing, therefore, the spatial dependence. The application the diagnostic techniques should be part of any geoestatistic analysis, ensuring that the information contained in thematic maps have better quality and can be used with greater security by the farmer. / Conceitos de monitoramento e gerenciamento do processo de produção agrícola vêm sendo utilizados como ótima opção de estratégia gerencial na agricultura. Tais conceitos consideram a variabilidade espacial das variáveis em estudo. A modelagem da estrutura de dependência espacial pela abordagem da geoestatística é de fundamental importância para a definição de parâmetros que definem esta estrutura e que são utilizados na interpolação de valores em locais não amostrados, pela técnica de krigagem. Entretanto, a estimação de parâmetros pode ser muito afetada pela presença de observações atípicas nos dados amostrados. O desenvolvimento deste trabalho teve por objetivo utilizar técnicas de diagnóstico em modelos espaciais lineares gaussianos, utilizados em geoestatística, para avaliar a sensibilidade dos estimadores máxima verossimilhança e máxima verossimilhança restrita a pequenas perturbações nos dados. Realizaram-se estudos com dados simulados, com dados da bibliografia e também com dados experimentais, coletados em uma área agrícola comercial da região Oeste do Paraná. O estudo com dados simulados mostrou que as técnicas de diagnóstico utilizadas foram eficientes na identificação da perturbação nos dados. O estimador de máxima verossimilhança restrita produziu estimativas mais robustas para os parâmetros de dependência espacial. Pelos resultados obtidos com o estudo de dados reais, concluiu-se que a presença de valores atípicos entre os dados amostrados pode exercer forte influência nos mapas temáticos, alterando, assim, a dependência espacial. A aplicação de técnicas de diagnóstico deve fazer parte de toda análise geoestatística, garantindo que as informações contidas nos mapas temáticos tenham maior qualidade e possam ser utilizadas com maior segurança pelo agricultor.
63

Técnicas de diagnósticos em modelos espaciais lineares gaussianos / DIAGNOSTICS TECHNIQUES IN SPATIAL LINEAR GAUSSIANS MODELS

Borssoi, Joelmir André 04 December 2007 (has links)
Made available in DSpace on 2017-05-12T14:47:42Z (GMT). No. of bitstreams: 1 JOELMIR ANDRE BORSSOI.pdf: 1897222 bytes, checksum: 4bfaafc0659eed32ceef8dc0fe90a8fe (MD5) Previous issue date: 2007-12-04 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Tracking and management concepts of the process of agricultural production are being used as a great option of strategy management in agriculture. Such concepts consider the spatial variability of the variables at study. The modeling of the spatial dependence structure of the geoestatistic approach is fundamental importance for the definition parameters that define this structure and are used in the interpolation of values in places not sampled, by kriging techniques. However, the estimation of parameters can be greatly affected by the presence atypical observations in the data sampled. The development of this work was aimed at using diagnostics techniques in spatial linear gaussians models, used in geoestatistics, to evaluate the sensitivity of the maximum likelihood estimators and restrict maximum likelihood to small perturbations in the data. Studies were performed with simulated data, with literature data and with experimental data, collected in a commercial agricultural area in the region West of Paraná. The study with simulated data showed that the techniques used in diagnostics were efficient in identifying the perturbation data. The restrict maximum likelihood estimator produced more robust estimates for the parameters spatial dependence. Those results obtained from the study of real data, it was concluded that the presence atypical values between the sampled data can exert strong influence on thematic maps, changing, therefore, the spatial dependence. The application the diagnostic techniques should be part of any geoestatistic analysis, ensuring that the information contained in thematic maps have better quality and can be used with greater security by the farmer. / Conceitos de monitoramento e gerenciamento do processo de produção agrícola vêm sendo utilizados como ótima opção de estratégia gerencial na agricultura. Tais conceitos consideram a variabilidade espacial das variáveis em estudo. A modelagem da estrutura de dependência espacial pela abordagem da geoestatística é de fundamental importância para a definição de parâmetros que definem esta estrutura e que são utilizados na interpolação de valores em locais não amostrados, pela técnica de krigagem. Entretanto, a estimação de parâmetros pode ser muito afetada pela presença de observações atípicas nos dados amostrados. O desenvolvimento deste trabalho teve por objetivo utilizar técnicas de diagnóstico em modelos espaciais lineares gaussianos, utilizados em geoestatística, para avaliar a sensibilidade dos estimadores máxima verossimilhança e máxima verossimilhança restrita a pequenas perturbações nos dados. Realizaram-se estudos com dados simulados, com dados da bibliografia e também com dados experimentais, coletados em uma área agrícola comercial da região Oeste do Paraná. O estudo com dados simulados mostrou que as técnicas de diagnóstico utilizadas foram eficientes na identificação da perturbação nos dados. O estimador de máxima verossimilhança restrita produziu estimativas mais robustas para os parâmetros de dependência espacial. Pelos resultados obtidos com o estudo de dados reais, concluiu-se que a presença de valores atípicos entre os dados amostrados pode exercer forte influência nos mapas temáticos, alterando, assim, a dependência espacial. A aplicação de técnicas de diagnóstico deve fazer parte de toda análise geoestatística, garantindo que as informações contidas nos mapas temáticos tenham maior qualidade e possam ser utilizadas com maior segurança pelo agricultor.
64

Estimating the Early Evolution of Brachiopods Using an Integrated Approach Combining Genomics and Fossils / En uppskattning av armfotingarnas tidiga evolution med hjälp av genomik och fossil

Robert, Chloé January 2019 (has links)
The Brachiopoda, a major group of the Lophotrochozoa, experienced a rapid early evolutionary diversification during the well-known Cambrian explosion and subsequently dominated the Palaeozoic benthos with its diversity and abundance. Even though the phylogeny of the Lophotrochozoa is still hotly debated, it is now known that the Brachiopoda are a monophyletic grouping. However, the early evolutionary rates for the Brachiopoda have never been studied in the framework of a study combining molecular data and fossil time calibration points. In order to investigate the expected higher evolutionary rates of the Phylum at its origin, we conducted phylogenetic studies combining different methodologies and datasets. This work has at its foundation Maximum Likelihood and Bayesian analyses of 18S and 28S rRNA datasets followed by analyses of phylogenomic sequences. All material was obtained from previously available sequences and from sequencing of genetic material from specimens from a concerted worldwide collection effort.       While the analyses of the phylogenomic dataset produced a robust phylogeny of the Brachiopoda with good support, both the results of the novel rRNA and phylogenomic dating analyses provided limited insights into the early rates of evolution of the Brachiopoda from a newly assembled dataset, demonstrating some limitations in calibration dating using the software package BEAST2. Future studies implementing fossil calibration, possibly incorporating morphological data, should be attempted to elucidate the early rates of evolution of Brachiopoda and the effect of the Push of the Past in this clade. / Det är ofta antaget att evolution (förändringar i arvsmassan hos en grupp organismer) sker i en konstant hastighet men i slutändan ändå osäkert om så är fallet. Stora grupper av organismer har ofta associerats med en högre evolutionär hastighet, speciellt nära deras uppkomst, vilket ökar sannolikheten för överlevnad.  Armfotingar (Brachiopoda) är marina ryggradslösa djur med skal som tidigare var allmänt spridd, idag är istället musslor (Bivalvia) betydligt mer spridda. Armfotingar har funnits och utvecklats under flera miljoner år med ursprung under tidigt kambrium. Genom år av forskning och många fossil har vi fått mer information om utseendet hos utdöda organismer vilket har bidragit till att antalet fossila arter som vi känner till har ökat tusenfalt. Under den senaste tiden har det också skett innovationer inom molekylära tekniker som gjort det möjligt att applicera dessa kunskaper även på utdöda arter. Dessa molekylära tekniker har nyligen hjälpt till att bestämma några av släktskapsförhållandena inom armfotingar som tidigare ansetts vara väldigt svåra att lösa.  Det finns fortfarande vissa släktskapsförhållanden inom armfotingar som inte är kända och man vet ännu inte hur fort de utvecklades. Genom att undersöka just evolutionens hastighet kan man börja förstå gruppens tidiga framgång under Kambrium och Ordovicium samt minskningen som följde. Syftet med den här studien var att beräkna evolutionshastigheten hos armfotingar med särskild fokus på den tidiga diversifieringen av gruppen. För att undersöka detta använde vi oss av molekylära data för att analysera släktskapsförhållandena inom armfotingar. Dessutom använde vi fossil för att datera stora händelser i armfotingarnas evolutionära historia. Med hjälp av statistiska analyser kunde vi beräkna evolutionshastighet och släktskapsförhållandena inom gruppen. Vi kom fram till att armfotingar härstammar från en gemensam förfader. Dateringen kring när detta skedde blev inte fastställd då det beräknades ske miljoner år före det äldsta djurfossilet. Det kommer behövas mer forskning för att ta reda på om armfotingar hade en högre evolutionär hastighet i tidigt skede.
65

Métodos de estimação em regressão logística com efeito aleatório: aplicação em germinação de sementes / Estimation methods in logistic regression with random effects: application in seed germination

Araujo, Gemma Lucia Duboc de 01 February 2012 (has links)
Made available in DSpace on 2015-03-26T13:32:15Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1213757 bytes, checksum: a4899ab14bd6c737501e8ef972e42d9e (MD5) Previous issue date: 2012-02-01 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In logistic mixed models with random effect on intercept allows capturing the effects of sources of variation from the particular characteristics of a group (heterogeneity), deflating the pure error and causing a fluctuation in the model intercept. This inclusion brings complexity in estimation methods and also changes the interpretation of the parameters that, originally given by the odds ratio, is then seen from the median odds ratio. The estimation parameters of a mixed model can be made by many different methods with varying performance, as the Laplace s approximation method, maximum likelihood (ML) and restricted maximum likelihood (REML). The objective of this work was to verify in logistic mixed models with random effects on intercept the consequences in interpretation of parameters, in quality of experiment and in classification of treatment via the median odds ratio, and verify the performance of the estimation methods above cited. The analyzes were performed under simulation and after in set of real data from seeds germination experiment of physic nut (Jatropha curcas L.). Considering the logistic mixed model with random effects on intercept, it was verified that the REML estimation method performed better and that the variance of the random effect affects the performance of any of these methods being evaluated inversely proportional. We suggest further studies to determine more properly the influence of the inflexion points and the effective median level in performance methods. In the experiment to evaluate the seeds germination of physic nut involving roll paper, on paper, on sand and between sand substrates, the inclusion of random effects in logistic model showed considerable heterogeneity in seeds germination in different units of the same substrate. The median odds ratio showed the superiority of the substrate between sand over on paper in seeds germination of physic nut, result similar to that obtained by the Tukey s test. / Em modelos de regressão logística a inclusão do efeito aleatório no intercepto permite capturar os efeitos de fontes de variação provenientes das características particulares de um grupo (heterogeneidade), desinflacionando o erro puro e provocando uma flutuação no intercepto do modelo. Esta inclusão traz complexidade nos métodos de estimação e também muda a interpretação dos parâmetros que, dada originalmente pela razão de chances, passa a ser vista sob o enfoque da razão de chances mediana. A estimação dos parâmetros de um modelo misto pode ser feita por muitos métodos diferentes com desempenho variado, como o método da aproximação de Laplace, da máxima verossimilhança (ML) e da máxima verossimilhança restrita (REML). Assim, o objetivo deste trabalho foi verificar em modelos de regressão logística com efeito aleatório no intercepto as consequências na interpretação dos parâmetros, na qualidade de um experimento e na classificação de tratamentos via razão de chances mediana, e verificar o desempenho dos métodos de estimação acima citados. As análises foram feitas sob simulação e posteriormente num conjunto de dados reais de um experimento com germinação de sementes de pinhão-manso (Jatropha curcas L.). Considerando o modelo de regressão logística com efeito aleatório no intercepto, verificou-se que o método de estimação REML apresentou melhor desempenho e que a variância do efeito aleatório afeta o desempenho de qualquer um dos métodos avaliados sendo estes inversamente proporcionais. Sugerem-se novos estudos para determinar com mais propriedade a influência dos pontos de estabilização e do nível mediano de efetividade na eficiência dos métodos. No experimento de avaliação de germinação de sementes de pinhão-manso envolvendo os substratos rolo de papel, sobre papel, sobre areia e entre areia, a inclusão do efeito aleatório no modelo logístico apontou considerável heterogeneidade na germinação de sementes em unidades diferentes de um mesmo substrato. A razão de chances mediana apontou a superioridade do substrato entre areia em relação a sobre papel na germinação de sementes de pinhão-manso, resultado semelhante ao obtido pelo teste de Tukey.
66

Optimalizace tvorby trénovacího a validačního datasetu pro zvýšení přesnosti klasifikace v dálkovém průzkumu Země / Training and validation dataset optimization for Earth observation classification accuracy improvement

Potočná, Barbora January 2019 (has links)
This thesis deals with training dataset and validation dataset for Earth observation classification accuracy improvement. Experiments with training data and validation data for two classification algorithms (Maximum Likelihood - MLC and Support Vector Machine - SVM) are carried out from the forest-meadow landscape located in the foothill of the Giant Mountains (Podkrkonoší). The thesis is base on the assumption that 1/3 of training data and 2/3 of validation data is an ideal ratio to achieve maximal classification accuracy (Foody, 2009). Another hypothesis was that in a case of SVM classification, a lower number of training point is required to achieve the same or similar accuracy of classification, as in the case of the MLC algorithm (Foody, 2004). The main goal of the thesis was to test the influence of proportion / amount of training and validation data on the classification accuracy of Sentinel - 2A multispectral data using the MLC algorithm. The highest overal accuracy using the MLC classification algorithm was achieved for 375 training and 625 validation points. The overal accuracy for this ratio was 72,88 %. The theory of Foody (2009) that 1/3 of training data and 2/3 of validation data is an ideal ratio to achieve the highest classification accuracy, was confirmed by the overal accuracy and...
67

Zobecněné odhadovací rovnice (GEE) / Generalized estimating equaitons

Sotáková, Martina January 2020 (has links)
In this thesis we are interested in generalized estimating equations (GEE). First, we introduce the term of generalized linear model, on which generalized estimating equations are based. Next we present the methos of pseudo maximum likelyhood and quasi-pseudo maximum likelyhood, from which we move on to the methods of generalized estimating equations. Finally, we perform simulation studies, which demonstrates the theoretical results presented in the thesis. 1
68

Identifiering av den invasiva lupinen (Lupinus polyphyllus) : Övervakning av blomsterlupiner längst vägkanter med hjälp av högupplösta UAV-data och GIS / Identifying the invasive Lupinus flower (Lupinus polyphyllus) : Monitoring Lupinus flowers growth along roads using high resolution UAV images an GIS

Petersen, Pontus January 2022 (has links)
Sveriges vägdiken och vägkanter är hem till många blommor och växtarter. Lupin-blomman Lupinus polyphyllus är en invasiv växtart som kom till Sverige under 1800-talet. Lupinblommans egenskaper gör att växten konkurrerar ut andra växtarter och negativt påverkar svensk biologisk mångfald. Naturvårdsverket och Trafikverket övervakar och hanterar lupinspridningen i Sverige. Det finns dock inget uppsatt digitalt system för övervakning utan myndigheterna förlitar sig mycket på inrapportering av lupinblommor. I denna studie utforskades metoder och parametrar för att med hjälp av GIS och klassificering identifiera lupinblommor med hjälp av högupplösta UAV-foton. Huvudmoment var att undersöka hur väl klassificeringsmetoderna random forest (RF) och maximum likelihood (MLC) identifierar lupiner, vilken flyghöjd för UAV och segmentering vid bildhantering som bör väljas. En tidsnotering på hur länge de olika metoderna tog att bearbeta för programmet utfördes även. Endast övervakad klassificering inom programmet ESRI ArcGIS Pro genomfördes. I studien användes rasterdata insamlad via två UAV längstseparata två vägsträckor på 200 m med flyghöjd från 10 till 120 m. Studien utfördes med segmenteringsparametrarna 1, 5, 10, 15 och 20 i spektrala detaljnivå över ett mindre testområde med 20 m flyghöjd. På dessa segmenteringar testades klassificeringsmetoderna MLC och RF. Baserat på resultaten ifrån dessa tester valdes en klassificeringsmetod ut och med denna utfördes tester på flyghöjd för att få fram var optimal flyghöjd låg. De flyghöjder som testades var 20 m, 50 m och 85 m. Vid varje processnoterades även tidsåtgången. Resultaten kontrollerades via Confusion Matrix och överklassificering för att identifiera den mest effektiva och noggranna metoden. Resultaten ifrån segmenteringen visade att metoden MLC generellt gav godast resultat med en överklassificering mellan +1 % och +3 % och noggrannhet på +90 %. RF gav resultat som låg på +1 % till +9 % överklassificering och noggrannhet var även här +90 %.Flyghöjdstesterna visade att 20m hade en noggrannhet på 97% och överklassificering på4,04 %. 50 m visade en noggrannhet 99 % och överklassificering på 8,17 %. 85 m hade noggrannhet på 53 % och överklassificering på 4,19 % Tidkontrollen visade att de objektbaserade metod var runt 33 % snabbare att utföra än pixelbaserad. Inga stora skillnader mellan klassificeringsmetoder hittades. Generellt visade resultaten att en objektbaserad MLC metod på 20 m gav godast resultat och går snabbast att utföra. Det är möjligt att 30 eller 40 m ger lika goda resultat men dessa höjder fanns ej tillgängligt att testa. Skillnaderna mellan klassificeringsnoggrannheter med RF och MLC var marginella. / Roadsides in Sweden are home to several different plant species. The lupine flower Lupinus polyphyllus is an invasive species originally from North America. Naturvårdsverket and Trafikverket are responsible for monitoring and handle lupine spread in Sweden. This study examined the use of GIS and aerial photos in lupine control and more specifically what parameters and classification methods that are suitable in identifying Lupinus polyphyllus. The two main classification methods were random forest (RF) and maximum likelihood classifiers(MLC). Other factors were the altitude of the UAV collecting the photos and what segmentation parameters were optimal for classification. Processing time when performing the different parameters and methods were also collected. The study used raster data from two drones with altitudes from 10 m to 120 m and the program used to perform these tests were ArcGIS Pro. The segmentation spectral detail levels tested were 1, 5, 10, 15 and 20, these were tested on a smaller area with a flight altitude of 20 m and both RF and MLC were tested on all detail levels. Based on these tests a classification method and segmentation parameters were chosen and tested on differing flight altitudes. These altitudes were 20, 50 and 85 m. A confusion matrix and overestimation of classes were used to determine accuracy and overclassification. Results show that supervised object-based MLC on a raster generated from a 20 m flight altitude gave generally the best results. In this case the accuracy was around 90 % and overclassification was around 1-3 %. Object-based classification was around 33 % faster than pixel-based classification but classification method did not alter the time any noticeable amount. However, it should be noted that a flight height of 30 or 40 m might give equally as good results as 20 m but those altitudes were not available for testing. It should also be pointed out that the difference between RF and MLC was not huge but the desired accuracy and over classification might be stringier depending on the needs of the user.
69

Bayesian Networks for Modelling the Respiratory System and Predicting Hospitalizations

Lopo Martinez, Victor January 2023 (has links)
Bayesian networks can be used to model the respiratory system. Their structure indicate how risk factors, symptoms, and diseases are related and the Conditional Probability Tables enable predictions about a patient’s need for hospitalization. Numerous structure learning algorithms exist for discerning the structure of a Bayesian network, but none can guarantee to find the perfect structure. Employing multiple algorithms can discover relationships between variables that might otherwise remain hidden when relying on a single algorithm. The Maximum Likelihood Estimator is the predominant algorithm for learning the Conditional Probability Tables. However, it faces challenges due to the data fragmentation problem, which can compromise its predictions. Failing to hospitalize patients who require specialized medical care could lead to severe consequences. Therefore, in this thesis, the use of an XGBoost model for learning is proposed as a novel and better method since it does not suffer from data fragmentation. A Bayesian network is constructed combining several structure learning algorithms, and the predictive performance of the Maximum Likelihood Estimator and XGBoost are compared. XGBoost achieved a maximum accuracy of 86.0% compared to the Maximum Likelihood Estimator, which attained an accuracy of 81.5% in predicting future patient hospitalization. In this way, the predictive performance of Bayesian networks has been enhanced. / Bayesianska nätverk kan användas för att modellera andningssystemet. Deras struktur visar hur riskfaktorer, symtom och sjukdomar är relaterade, och de villkorliga sannolikhetstabellerna möjliggör prognoser om en patients behov av sjukhusvård. Det finns många strukturlärningsalgoritmer för att urskilja strukturen i ett bayesianskt nätverk, men ingen kan garantera att hitta den perfekta strukturen. Genom att använda flera algoritmer kan man upptäcka relationer mellan variabler som annars kan förbli dolda när man bara förlitar sig på en enda algoritm. Maximum Likelihood Estimator är den dominerande algoritmen för att lära sig de villkorliga sannolikhetstabellerna. Men den står inför utmaningar på grund av datafragmenteringsproblemet, vilket kan äventyra dess prognoser. Att inte lägga in patienter som behöver specialiserad medicinsk vård kan leda till allvarliga konsekvenser. Därför föreslås i denna avhandling användningen av en XGBoost-modell för inlärning som en ny och bättre metod eftersom den inte lider av datafragmentering. Ett bayesianskt nätverk byggs genom att kombinera flera strukturlärningsalgoritmer, och den prediktiva prestandan för Maximum Likelihood Estimator och XGBoost jämförs. XGBoost uppnådde en maximal noggrannhet på 86,0% jämfört med Maximum Likelihood Estimator, som uppnådde en noggrannhet på 81,5% för att förutsäga framtida patientinläggning. På detta sätt har den prediktiva prestandan för bayesianska nätverk förbättrats.
70

Assessment of Modern Statistical Modelling Methods for the Association of High-Energy Neutrinos to Astrophysical Sources / Bedömning av moderna statistiska modelleringsmetoder för associering av högenergetiska neutroner till astrofysiska källor

Minoz, Valentin January 2021 (has links)
The search for the sources of astrophysical neutrinos is a central open question in particle astrophysics. Thanks to substantial experimental efforts, we now have large-scale neutrino detectors in the oceans and polar ice. The neutrino sky seems mostly isotropic, but hints of possible source-neutrino associations have started to emerge, leading to much excitement within the astrophysics community. As more data are collected and future experiments planned, the question of how to statistically quantify point source detection in a robust way becomes increasingly pertinent. The standard approach to null-hypothesis testing leads to reporting the results in terms of a p-value, with detection typically corresponding to surpassing the coveted 5-sigma threshold. While widely used, p-values and significance thresholds are notorious in the statistical community as challenging to interpret and potentially misleading. We explore an alternative Bayesian approach to reporting point source detection and the connections and differences with the frequentist view. In this thesis, two methods for associating neutrino events to candidate sources are implemented on data from a simplified simulation of high-energy neutrino generation and detection. One is a maximum likelihood-based method that has been used in some high-profile articles, and the alternative uses Bayesian Hierarchical modelling with Hamiltonian Monte Carlo to sample the joint posterior of key parameters. Both methods are applied to a set of test cases to gauge their differences and similarities when applied on identical data. The comparisons suggest the applicability of this Bayesian approach as alternative or complement to the frequentist, and illustrate how the two approaches differ. A discussion is also conducted on the applicability and validity of the study itself as well as some potential benefits of incorporating a Bayesian framework, with suggestions for additional aspects to analyze. / Sökandet efter källorna till astrofysiska neutriner är en central öppen fråga i astropartikel- fysik. Tack vare omfattande experimentella ansträngningar har vi nu storskaliga neutrino-detektorer i haven och polarisen. Neutrinohimlen verkar mestadels isotropisk, men antydningar till möjliga källneutrinoföreningar har börjat antydas, vilket har lett till mycket spänning inom astrofysikgemenskapen. När mer data samlas in och framtida experiment planeras, blir frågan om hur man statistiskt kvantifierar punktkälledetektering på ett robust sätt alltmer relevant. Standardmetoden för nollhypotes-testning leder ofta till rapportering av resultat i termer av p-värden, då en specifik tröskel i signifikans eftertraktas. Samtidigt som att vara starkt utbredda, är p-värden och signifikansgränser mycket omdiskuterade i det statistiska samfundet angående deras tolkning. Vi utforskar en alternativ Bayesisk inställning till utvärderingen av punktkälldetektering och jämför denna med den frekvensentistiska utgångspunkten. I denna uppsats tillämpas två metoder för att associera neutrinohändelser till kandidatkällor på basis av simulerad data. Den första använder en maximum likelihood-metod anpassad från vissa uppmärksammade rapporter, medan den andra använder Hamiltonsk Monte Carlo till att approximera den gemensamma aposteriorifördelningen hos modellens parametrar. Båda metoderna tillämpas på en uppsättning testfall för att uppskatta deras skillnader och likheter tillämpade på identisk data. Jämförelserna antyder tillämpligheten av den Bayesianska som alternativ eller komplement till den klassiska, och illustrerar hur de två metoderna skiljer sig åt. En diskussion förs också om validiteten av studien i sig samt några potentiella fördelar med att använda ett Bayesiskt ramverk, med förslag på ytterligare aspekter att analysera.

Page generated in 0.0739 seconds