• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 158
  • 156
  • 30
  • 7
  • 6
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 436
  • 436
  • 176
  • 152
  • 146
  • 115
  • 101
  • 70
  • 54
  • 50
  • 40
  • 36
  • 34
  • 33
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
341

Empirical likelihood and mean-variance models for longitudinal data

Li, Daoji January 2011 (has links)
Improving the estimation efficiency has always been one of the important aspects in statistical modelling. Our goal is to develop new statistical methodologies yielding more efficient estimators in the analysis of longitudinal data. In this thesis, we consider two different approaches, empirical likelihood and jointly modelling the mean and variance, to improve the estimation efficiency. In part I of this thesis, empirical likelihood-based inference for longitudinal data within the framework of generalized linear model is investigated. The proposed procedure takes into account the within-subject correlation without involving direct estimation of nuisance parameters in the correlation matrix and retains optimality even if the working correlation structure is misspecified. The proposed approach yields more efficient estimators than conventional generalized estimating equations and achieves the same asymptotic variance as quadratic inference functions based methods. The second part of this thesis focus on the joint mean-variance models. We proposed a data-driven approach to modelling the mean and variance simultaneously, yielding more efficient estimates of the mean regression parameters than the conventional generalized estimating equations approach even if the within-subject correlation structure is misspecified in our joint mean-variance models. The joint mean-variances in parametric form as well as semi-parametric form has been investigated. Extensive simulation studies are conducted to assess the performance of our proposed approaches. Three longitudinal data sets, Ohio Children’s wheeze status data (Ware et al., 1984), Cattle data (Kenward, 1987) and CD4+ data (Kaslowet al., 1987), are used to demonstrate our models and approaches.
342

Modelos de regressão beta retangular heteroscedásticos aumentados em zeros e uns / Zero-one augmented heteroscedastic rectangular beta regression models

Silva, Ana Roberta dos Santos, 1989- 26 August 2018 (has links)
Orientador: Caio Lucidius Naberezny Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-26T19:30:15Z (GMT). No. of bitstreams: 1 Silva_AnaRobertadosSantos_M.pdf: 4052475 bytes, checksum: 08fb6f3f7b4ed838df4eea2dbcf06a29 (MD5) Previous issue date: 2015 / Resumo: Neste trabalho desenvolvemos a distribuição beta retangular aumentada em zero e um, bem como um correspondente modelo de regressão beta retangular aumentado em zero e um para analisar dados limitados-aumentados (representados por variáveis aleatórias mistas com suporte limitado), que apresentam valores discrepantes. Desenvolvemos ferramentas de inferência sob as abordagens bayesiana e frequentista. No que diz respeito à inferência bayesiana, devido à impossibilidade de obtenção analítica das posteriores de interesse, utilizou-se algoritmos MCMC. Com relação à estimação frequentista, utilizamos o algoritmo EM. Desenvolvemos técnicas de análise de resíduos, utilizando o resíduo quantil aleatorizado, tanto sob o enfoque frequentista quanto bayesiano. Desenvolvemos, também, medidas de influência, somente sob o enfoque bayesiano, utilizando a medida de Kullback Leibler. Além disso, adaptamos métodos de checagem preditiva à posteriori existentes na literatura, ao nosso modelo, utilizando medidas de discrepância apropriadas. Para a comparação de modelos, utilizamos os critérios usuais na literatura, como AIC, BIC e DIC. Realizamos diversos estudos de simulação, considerando algumas situações de interesse prático, com o intuito de comparar as estimativas bayesianas com as frequentistas, bem como avaliar o comportamento das ferramentas de diagnóstico desenvolvidas. Um conjunto de dados da área psicométrica foi analisado para ilustrar o potencial do ferramental desenvolvido / Abstract: In this work we developed the zero-one augmented rectangular beta distribution, as well as a correspondent zero-one augmented rectangular beta regression model to analyze limited-augmented data (represented by mixed random variables with limited support), which present outliers. We develop inference tools under the Bayesian and frequentist approaches. Regarding to the Bayesian inference, due the impossibility of obtaining analytically the posterior distributions of interest, we used MCMC algorithms. Concerning the frequentist estimation, we use the EM algorithm. We develop techniques of residual analysis, by using the randomized quantile residuals, under both frequentist and Bayesian approaches. We also developed influence measures, only under the Bayesian approach, by using the measure of Kullback Leibler. In addition, we adapt methods of posterior predictive checking available in the literature, to our model, using appropriate discrepancy measures. For model selection, we use the criteria commonly employed in the literature, such as AIC, BIC and DIC. We performed several simulation studies, considering some situations of practical interest, in order to compare the Bayesian and frequentist estimates, as well as to evaluate the behavior of the developed diagnostic tools. A psychometric real data set was analyzed to illustrate the performance of the developed tools / Mestrado / Estatistica / Mestra em Estatística
343

Modelos para dados censurados sob a classe de distribuições misturas de escala skew-normal / Censored regression models under the class of scale mixture of skew-normal distributions

Massuia, Monique Bettio, 1989- 03 June 2015 (has links)
Orientador: Víctor Hugo Lachos Dávila / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-26T19:55:07Z (GMT). No. of bitstreams: 1 Massuia_MoniqueBettio_M.pdf: 2926597 bytes, checksum: 2a1154c0a61b13f369e8390159fc4c3e (MD5) Previous issue date: 2015 / Resumo: Este trabalho tem como objetivo principal apresentar os modelos de regressão lineares com respostas censuradas sob a classe de distribuições de mistura de escala skew-normal (SMSN), visando generalizar o clássico modelo Tobit ao oferecer alternativas mais robustas à distribuição Normal. Um estudo de inferência clássico é desenvolvido para os modelos em questão sob dois casos especiais desta família de distribuições, a Normal e a t de Student, utilizando o algoritmo EM para obter as estimativas de máxima verossimilhança dos parâmetros dos modelos e desenvolvendo métodos de diagnóstico de influência global e local com base na metodologia proposta por Cook (1986) e Poom & Poon (1999). Sob o enfoque Bayesiano, o modelo de regressão para respostas censuradas é estudado sob alguns casos especiais da classe SMSN, como a Normal, a t de Student, a skew-Normal, a skew-t e a skew-Slash. Neste caso, o amostrador de Gibbs é a principal ferramenta utilizada para a inferência sobre os parâmetros do modelo. Apresentamos também alguns estudos de simulação para avaliar a metodologia desenvolvida que, por fim, é aplicada em dois conjuntos de dados reais. Os pacotes SMNCensReg, CensRegMod e BayesCR para o software R dão suporte computacional aos desenvolvimentos deste trabalho / Abstract: This work aims to present the linear regression model with censored response variable under the class of scale mixture of skew-normal distributions (SMSN), generalizing the well known Tobit model as providing a more robust alternative to the normal distribution. A study based on classic inference is developed to investigate these censored models under two special cases of this family of distributions, Normal and t-Student, using the EM algorithm for obtaining maximum likelihood estimates and developing methods of diagnostic based on global and local influence as suggested by Cook (1986) and Poom & Poon (1999). Under a Bayesian approach, the censored regression model was studied under some special cases of SMSN class, such as Normal, t-Student, skew-Normal, skew-t and skew-Slash. In these cases, the Gibbs sampler was the main tool used to make inference about the model parameters. We also present some simulation studies for evaluating the developed methodologies that, finally, are applied on two real data sets. The packages SMNCensReg, CensRegMod and BayesCR implemented for the software R give computational support to this work / Mestrado / Estatistica / Mestra em Estatística
344

Quantile regression for mixed-effects models = Regressão quantílica para modelos de efeitos mistos / Regressão quantílica para modelos de efeitos mistos

Galarza Morales, Christian Eduardo, 1988- 27 August 2018 (has links)
Orientador: Víctor Hugo Lachos Dávila / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-27T06:40:31Z (GMT). No. of bitstreams: 1 GalarzaMorales_ChristianEduardo_M.pdf: 5076076 bytes, checksum: 0967f08c9ad75f9e7f5df339563ef75a (MD5) Previous issue date: 2015 / Resumo: Os dados longitudinais são frequentemente analisados usando modelos de efeitos mistos normais. Além disso, os métodos de estimação tradicionais baseiam-se em regressão na média da distribuição considerada, o que leva a estimação de parâmetros não robusta quando a distribuição do erro não é normal. Em comparação com a abordagem de regressão na média convencional, a regressão quantílica (RQ) pode caracterizar toda a distribuição condicional da variável de resposta e é mais robusta na presença de outliers e especificações erradas da distribuição do erro. Esta tese desenvolve uma abordagem baseada em verossimilhança para analisar modelos de RQ para dados longitudinais contínuos correlacionados através da distribuição Laplace assimétrica (DLA). Explorando a conveniente representação hierárquica da DLA, a nossa abordagem clássica segue a aproximação estocástica do algoritmo EM (SAEM) para derivar estimativas de máxima verossimilhança (MV) exatas dos efeitos fixos e componentes de variância em modelos lineares e não lineares de efeitos mistos. Nós avaliamos o desempenho do algoritmo em amostras finitas e as propriedades assintóticas das estimativas de MV através de experimentos empíricos e aplicações para quatro conjuntos de dados reais. Os algoritmos SAEMs propostos são implementados nos pacotes do R qrLMM() e qrNLMM() respectivamente / Abstract: Longitudinal data are frequently analyzed using normal mixed effects models. Moreover, the traditional estimation methods are based on mean regression, which leads to non-robust parameter estimation for non-normal error distributions. Compared to the conventional mean regression approach, quantile regression (QR) can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. This thesis develops a likelihood-based approach to analyzing QR models for correlated continuous longitudinal data via the asymmetric Laplace distribution (ALD). Exploiting the nice hierarchical representation of the ALD, our classical approach follows the stochastic Approximation of the EM (SAEM) algorithm for deriving exact maximum likelihood (ML) estimates of the fixed-effects and variance components in linear and nonlinear mixed effects models. We evaluate the finite sample performance of the algorithm and the asymptotic properties of the ML estimates through empirical experiments and applications to four real life datasets. The proposed SAEMs algorithms are implemented in the R packages qrLMM() and qrNLMM() respectively / Mestrado / Estatistica / Mestre em Estatística
345

Modelos para análise de dados discretos longitudinais com superdispersão / Models for analysis of longitudinal discrete data in the presence of overdispersion

Rizzato, Fernanda Bührer 08 February 2012 (has links)
Dados longitudinais na forma de contagens e na forma binária são muito comuns, os quais, frequentemente, podem ser analisados por distribuições de Poisson e de Bernoulli, respectivamente, pertencentes à família exponencial. Duas das principais limitações para modelar esse tipo de dados são: (1) a ocorrência de superdispersão, ou seja, quando a variabilidade dos dados não é adequadamente descrita pelos modelos, que muitas vezes apresentam uma relação pré-estabelecida entre a média e a variância, e (2) a correlação existente entre medidas realizadas repetidas vezes na mesma unidade experimental. Uma forma de acomodar a superdispersão é pela utilização das distribuições binomial negativa e beta binomial, ou seja, pela inclusão de um efeito aleatório com distribuição gama quando se considera dados provenientes de contagens e um efeito aleatório com distribuição beta quando se considera dados binários, ambos introduzidos de forma multiplicativa. Para acomodar a correlação entre as medidas realizadas no mesmo indivíduo podem-se incluir efeitos aleat órios com distribuição normal no preditor linear. Esses situações podem ocorrer separada ou simultaneamente. Molenberghs et al. (2010) propuseram modelos que generalizam os modelos lineares generalizados mistos Poisson-normal e Bernoulli-normal, incorporando aos mesmos a superdispersão. Esses modelos foram formulados e ajustados aos dados, usando-se o método da máxima verossimilhança. Entretanto, para um modelo de efeitos aleatórios, é natural pensar em uma abordagem Bayesiana. Neste trabalho, são apresentados modelos Bayesianos hierárquicos para dados longitudinais, na forma de contagens e binários que apresentam superdispersão. A análise Bayesiana hierárquica é baseada no método de Monte Carlo com Cadeias de Markov (MCMC) e para implementação computacional utilizou-se o software WinBUGS. A metodologia para dados na forma de contagens é usada para a análise de dados de um ensaio clínico em pacientes epilépticos e a metodologia para dados binários é usada para a análise de dados de um ensaio clínico para tratamento de dermatite. / Longitudinal count and binary data are very common, which often can be analyzed by Poisson and Bernoulli distributions, respectively, members of the exponential family. Two of the main limitations to model this data are: (1) the occurrence of overdispersion, i.e., the phenomenon whereby variability in the data is not adequately captured by the model, and (2) the accommodation of data hierarchies owing to, for example, repeatedly measuring the outcome on the same subject. One way of accommodating overdispersion is by using the negative-binomial and beta-binomial distributions, in other words, by the inclusion of a random, gamma-distributed eect when considering count data and a random, beta-distributed eect when considering binary data, both introduced by multiplication. To accommodate the correlation between measurements made in the same individual one can include normal random eects in the linear predictor. These situations can occur separately or simultaneously. Molenberghs et al. (2010) proposed models that simultaneously generalizes the generalized linear mixed models Poisson-normal and Bernoulli-normal, incorporating the overdispersion. These models were formulated and tted to the data using maximum likelihood estimation. However, these models lend themselves naturally to a Bayesian approach as well. In this paper, we present Bayesian hierarchical models for longitudinal count and binary data in the presence of overdispersion. A hierarchical Bayesian analysis is based in the Monte Carlo Markov Chain methods (MCMC) and the software WinBUGS is used for the computational implementation. The methodology for count data is used to analyse a dataset from a clinical trial in epileptic patients and the methodology for binary data is used to analyse a dataset from a clinical trial in toenail infection named onychomycosis.
346

Užití modelů diskrétních dat / Application of count data models

Reichmanová, Barbora January 2018 (has links)
Při analýze dat růstu rostlin v řádku dané délky bychom měli uvažovat jak pravděpodobnost, že semínko zdárně vyroste, tak i náhodný počet semínek, které byly zasety. Proto se v celé práci věnujeme analýze náhodných sum, kde počet nezávisle stejně rozdělených sčítanců je na nich nezávislé náhodné číslo. První část práce věnuje pozornost teoretickému základu, definuje pojem náhodná suma a uvádí vlastnosti, jako jsou číslené míry polohy nebo funkční charakteristiky popisující dané rozdělení. Následně je diskutována metoda odhadu parametrů pomocí maximální věrohodnosti a zobecněné lineární modely. Metoda kvazi-věrohodnosti je též krátce zmíněna. Tato část je ilustrována příklady souvisejícími s výchozím problémem. Poslední kapitola se věnuje aplikaci na reálných datech a následné analýze.
347

Analysis of road traffic accidents in Limpopo Province using generalized linear modelling

Mphekgwana, Modupi Peter January 2020 (has links)
Thesis (M.Sc. (Statistics)) -- University of Limpopo, 2020 / Background: Death and economic losses due to road traffic accidents (RTA) are huge global public health and developmental problems and need urgent attention. Each year nearly 1.24 million people die and millions suffer various forms of disability as a result of road accidents. This puts road traffic injuries (RTIs) as the eighth leading cause of death globally and RTIs are set to become the fifth leading cause of death worldwide by the year 2030 unless urgent actions are taken. Aim: In this paper, we investigate factors that contribute to road traffic deaths (RTDs) in the Limpopo province of South Africa using models such as the generalized linear models (GLM) and zero inflated models. Methods: The study was based on retrospective data that comprised of reports of 18,029 road traffic accidents and 4,944 road traffic deaths over the years 2009 – 2015. Generalized linear modelling and zero-inflated models were used to identify factors and determine their relationships to RTDs. Results: The data was split into two categories: deaths that occurred during holidays and those that occurred during non-holiday periods. It was found that the following variables, namely, Monday, human actions, vehicle conditions and vehicle makes, were significant predictors of RTDs during holidays. On the other hand, during non-holiday periods, weekend, Tuesday, Wednesday, national road, provincial road, sedan, LDV, combi and bus were found to be significant predictors of road traffic deaths. Conclusion: GLM techniques, such as the standard Poisson regression model and the negative binomial (NB) model, did little to explain the zero excess, therefore, zero-inflated models, such as zero-inflated negative binomial (ZINB), were found to be useful in explaining excess zeros. Recommendation: The study recommends that the government should make more human power available during the festive seasons, such as the December holidays, and over weekends.
348

[pt] EXPERIMENTOS COM MISTURA: UMA APLICAÇÃO COM RESPOSTAS NÃO-NORMAIS / [en] MIXTURE EXPERIMENTS: AN APPLICATION WITH NONNORMAL RESPONSES

LUIZ HENRIQUE ABREU DAL BELLO 03 January 2006 (has links)
[pt] Esta dissertação, além de apresentar uma abordagem de um caso prático real, fez reunir as técnicas estatísticas necessárias ao trato de experimentos envolvendo misturas. Foi visto que as metodologias adotadas em Projeto de Experimentos devem ser adaptadas para possibilitar o trato de problemas com misturas, já que há a necessidade de considerar a restrição básica desse tipo de experimento, o qual amarra a soma das proporções dos componentes, que deve ser sempre igual a 1, ou seja, 100%. O experimento do misto de retardo, objeto principal e motivador dessa dissertação, é um experimento com mistura, em que as proporções de todos os três componentes possuem restrições superiores e inferiores simultaneamente. Com essas restrições, o espaço fatorial restrito fica bem distorcido em relação ao simplex, havendo, portanto, a necessidade de geração de um design D-ótimo. Como houve a indicação de que a variância da resposta não é constante, no caso do misto de retardo, recorreu-se aos Modelos Lineares Generalizados, especificamente ao método da Quase- Verossimilhança. De posse do modelo adequado, pôde-se então determinar a proporção dos componentes do misto de retardo, tendo em vista o atendimento da especificação de projeto. / [en] This dissertation presents a real pactical case, and besides, it puts together the statistical techniques for the treatment of Mixture Experiments. It was presented, that the Design of Experiments techniques must be adapted in order to make possible the treatment of problems with mixtures, because the basic constraint in this type of experiment must be taken into account, that is, the sum of the proportions of all mixture components must be equal to 1 or 100%. The delay compound experiment, the main and motivating object in this dissertation, is a mixture experiment with simultaneous constraints in the proportions of all its three components. With these constraints, it is possible to observe a distortion in the restricted factorial design space in comparison to the simplex one. Therefore, it was necessary to generate a D-optimal design. When there was an indication that the response variance is not constant, in the case of the delay compound, the Generalized Linear Models, specifically the Quasi- Likelihood method was used to fit an adequate model. With the adequate model, it was possible to find the proportion of each component of the delay compound in order to attend the design specification.
349

The Impact of Weather on Residential Fires in Sweden: A Regression Analysis / Väders Inverkan på Bostadsbränder i Sverige: En Regressionsanalys

Reineck, Viktor, Ulfsparre, Folke January 2019 (has links)
The purpose of this report is to investigate possible relationships between the number of residential fires in Sweden and various weather parameters. The study is conducted based on a hypothesis as stated by the MSB, the Swedish Civil Contingencies Agency, that behavioral factors related to weather can have an influence on the number of residential fires. Generalized linear models within the regression analysis have been used and specifically Poisson and negative binomial regression. The aim was to map the possible connection and determine if it was possible to use the analysis as a tool to improve the emergency services in Sweden. Temperature, short term differences in temperature and precipitation were analyzed with residential fires as the dependent variable, which resulted in a model for each municipality in Sweden. The relationships between the weather parameters and residential fires, seen throughout Sweden, proved to be weak to non-existent with one exception. The average temperature variable was significant in 117 out of 290 municipalities and indicated a relationship where the expected number of residential fires decreases at temperature increases. Due to the weak relationships, the model is not recommended as a prognostic tool on a national level. However, individual models could be used as a supplement to current prognostic tools at a local level and used for preventive purposes. Thus, the study has concluded that weather has some impact on the expected number of residential fires and thus has the potential to be used as a tool when forecasting residential fires. As an addition to the regression analysis, an organizational analysis of the emergency services in Sweden is carried out. The analysis sought the optimal structure based on the emergency services conditions and requirements, which were defined on the basis of organizational concepts and methods. The result was a more structured operation and organization where methods and processes are managed at a centralized level. / Syftet med denna rapport är att undersöka eventuella samband mellan antalet bostadsbränder i Sverige och olika väderparametrar. Studien genomförts mot bakgrund av en hypotes ställd av MSB, Myndigheten för Samhällsskydd och Beredskap, om att beteendefaktorer relaterade till vädret kan ha en påverkan på antalet bostadsbränder. Generaliserade linjära modeller inom regressionsanalysen har använts och specifikt Poisson- och negativ binomialregression. Målet var att kartlägga det eventuella sambandet och avgöra huruvida det var möjligt att nyttja analysen som verktyg för att förbättra räddningstjänsten i Sverige. Temperatur, kortsiktig temperaturförändring och nederbörd analyserades med bostadsbränder som den beroende variabeln, vilket resulterade i en modell för varje svensk kommun. Sambanden mellan väderparametrarna och bostadsbränder, sett över hela Sverige, visade sig vara svaga till obefintliga med ett undantag. Variabeln för genomsnittstemperatur var signifikant i 117 av 290 kommuner och visade på ett samband där förväntat antal bostadsbränder minskar vid ökad temperatur. På grund av de svaga sambanden, sett över hela Sverige, rekommenderas inte modellen som prognostiskt verktyg på nationell nivå. Däremot skulle enskilda modeller kunna användas som komplement till nuvarande prognostiska verktyg på lokal nivå, samt användas i förebyggande syfte. Därmed har studien kommit fram till att väder har viss påverkan på det förväntade antalet bostadsbränder och således har potential att användas som verktyg vid prognos av bostadsbränder. Som ett komplement till regressionsanalysen genomförs en organisatorisk analys av räddningstjänsten i Sverige. Analysen sökte den optimala strukturen utifrån räddningstjänstens förutsättningar och krav, som definierades utifrån grundläggande organisatoriska begrepp och metoder. Resultatet blev en mer strukturerad verksamhet där metoder och processer sköts på en centraliserad nivå.
350

Supervised Learning for Prediction of Tumour Mutational Burden / Användning av statistisk inlärning för estimering av mutationsbörda

Hargell, Joanna January 2021 (has links)
Tumour Mutational Burden is a promising biomarker to predict response to immunotherapy. In this thesis, statistical methods of supervised learning were used to predict TMB: GLM, Decision Trees and SVM. Predictions were based on data from targeted DNA sequencing, using variants found in the exonic, intronic, UTR and intergenic regions of the human DNA. This project was of an exploratory nature, performed in a pan-cancer setting. Both regression and classification were considered. The purpose was to investigate whether variants found in these regions of the DNA sequence are useful when predicting TMB. Poisson regression and Negative binomial regression were used within the framework of GLM. The results indicated deficiencies in the model assumptions and that the use of GLM for the application is questionable. The single regression tree did not yield satisfactory prediction accuracy. However, performance was improved by using variance reducing methods such as bagging and random forests. The use of boosted regression trees did not yield any significant improvement in prediction accuracy. In the classification setting, binary as well as multiple classes were considered. The distinction between classes was based on commonly used thresholds in clinical care to achieve immunotherapy. SVM and classification trees yielded high prediction accuracy for the binary case: a misclassification rate of 0.0242 and 0 respectively for the independent test set. In the multiple classification setting, bagging and random forests were implemented, yet, did not improve performance over the single classification tree. SVM produced a misclassification rate of 0.103, and the corresponding number for the single classification tree was 0.109. It was concluded that SVM and Decision trees are suitable methods for predicting TMB based on targeted gene panels. However, to obtain reliable predictions, there is a need to move from a pan-cancer setting to a diagnosis-based setting. Furthermore, parameters affecting TMB, like pre-analytical factors need to be included in the statistical analysis. / Denna uppsats undersöker tre metoder inom statistisk inlärning: GLM, Decision Trees och SVM, med avsikt att förutsäga mutationsbörda, TMB, för cancerpatienter. Metoderna har applicerats både inom regression och klassificering. Förutsägelser gjordes baserat på data från panel-baserad DNA-sekvensering som innehåller varianter från kodande, introniska UTR och intergeniska regioner av mänskligt DNA. Projektet ämnar att undersöka om varianter från dessa regioner av DNA-sekvensen kan vara användbara för att förutsäga mutationsbördan för en patient. Poisson-regression och Negativ Binomial-regression undersöktes inom GLM. Resultaten indikerade på brister i modellerna och att GLM inte är lämplig för denna tillämpning. Regressionsträden gav inte tillräckligt noggranna förutsägelser, men implementering av bagging och random forests förbättrade modellernas prestanda. Boosting förbättrade inte resultaten. Inom klassificering användes både binära klasser och multipla klasser. Avgränsningen mellan klasser baserades på kända gränser för TMB inom vården för att få immunoterapi. SVM och decision trees gav god prestanda för binär klassificering, med ett klassificeringsfel på 0.024 för SVM och 0 för decision trees. Bagging och random forests implementerades för det multipla fallet inom decision trees, men förbättrade inte prestandan. För multipla klasser gav SVM ett klassificeringnsfel på 0.103 och decision trees 0.109. Både SVM och decision trees visade sig vara lämpliga metoder för för att förutse värdet på TMB. Däremot, för att förutsägelserna ska vara tillförlitliga finns det ett behov av att göra denna typ av analys för varje enskild cancerdiagnos. Dessutom finns det ett behov av att inkludera parametrar från den bioinformatiska processen i den statistiska analysen.

Page generated in 0.2807 seconds