• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 28
  • 5
  • 5
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 102
  • 102
  • 23
  • 22
  • 21
  • 17
  • 17
  • 17
  • 16
  • 13
  • 12
  • 12
  • 12
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA

Hu, Yin 01 January 2013 (has links)
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases.
22

Phylodynamic Methods for Infectious Disease Epidemiology

Rasmussen, David Alan January 2014 (has links)
<p>In this dissertation, I present a general statistical framework for phylodynamic inference that can be used to estimate epidemiological parameters and reconstruct disease dynamics from pathogen genealogies. This framework can be used to fit a broad class of epidemiological models, including nonlinear stochastic models, to genealogies by relating the population dynamics of a pathogen to its genealogy using coalescent theory. By combining Markov chain Monte Carlo and particle filtering methods, efficient Bayesian inference of all parameters and unobserved latent variables is possible even when analytical likelihood expressions are not available under the epidemiological model. Through extensive simulations, I show that this method can be used to reliably estimate epidemiological parameters of interest as well as reconstruct past disease dynamics from genealogies, or jointly from genealogies and other common sources of epidemiological data like time series. I then extend this basic framework to include different types of host population structure, including models with spatial structure, multiple-hosts or vectors, and different stages of infection. The later is demonstrated by using a multistage model of HIV infection to estimate stage-specific transmission rates and incidence from HIV sequence data collected in Detroit, Michigan. Finally, to demonstrate how the approach can be used more generally, I consider the case of dengue virus in southern Vietnam. I show how earlier phylodynamic inference methods fail to reliably reconstruct the dynamics of dengue observed in hospitalization data, but by deriving coalescent models that take into consideration ecological complexities like seasonality, vector dynamics and spatial structure, accurate dynamics can be reconstructed from genealogies. In sum, by extending phylodynamics to include more ecologically realistic and mechanistic models, this framework can provide more accurate estimates and give deeper insight into the processes driving infectious disease dynamics.</p> / Dissertation
23

Computerized achievement tests : sequential and fixed length tests

Wiberg, Marie H. January 2003 (has links)
The aim of this dissertation is to describe how a computerized achivement test can be constructed and used in practice. Throughout this dissertation the focus is on classifying the examinees into masters and non-masters depending on their ability. However, there has been no attempt to estimate their ability. In paper I, a criterion-referenced computerized test with a fixed number of items is expressed as a statistical inference problem. The theory of optimal design is used to find the test that has the strongest power. A formal proof is provided showing that all items should have the same item characteristics, viz. high discrimination, low guessing and difficulty near the cutoff score, in order to give us the most powerful statistical test. An efficiency study shows how many times more non-optimal items are needed if we do not use optimal items in order to achieve the same power in the test. In paper II, a computerized mastery sequential test is examined using sequential analysis. The focus is on examining the sequential probability ratio test and to minimize the number of items in a test, i.e. to minimize the average sample number function, abbreviated as the ASN function. Conditions under which the ASN function decreases are examined. Further, it is shown that the optimal values are the same for item discrimination and item guessing, but differ for item difficulty compared with tests with fixed number of items. Paper III presents three simulation studies of sequential computerized mastery tests. Three cases are considered, viz. the examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The simulations indicate that the observed results from the operating characteristic function differ significantly from the theoretical results. The mean number of items in a test, the distribution of test length and the variance depend on whether the true values of the item characteristics are known and whether they are iid or not. In paper IV computerized tests with both pretested items with known item parameters, and try-out items with unknown item parameters are considered. The aim is to study how the item parameters for try-out items can be estimated in a computerized test. Although the unknown examinees' abilities may act as nuisance parameters, the asymptotic variance of the item parameter estimators can be calculated. Examples show that a more reliable variance estimator yields much larger estimates of the variance than commonly used variance estimators.
24

Modelo de Grubbs em grupos / Grubbs' model with subgroups

Zeller, Camila Borelli 23 February 2006 (has links)
Orientador: Filidor Edilfonso Vilca Labra / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-05T23:55:16Z (GMT). No. of bitstreams: 1 Zeller_CamilaBorelli_M.pdf: 3683998 bytes, checksum: 26267086098b12bd76b1d5069f688223 (MD5) Previous issue date: 2006 / Resumo: Neste trabalho, apresentamos um estudo de inferência estatística no modelo de Grubbs em grupos, que representa uma extensão do modelo proposto por Grubbs (1948,1973) que é freqüentemente usado para comparar instrumentos ou métodos de medição. Nós consideramos a parametrização proposta por Bedrick (2001). O estudo é baseado no método de máxima verossimilhança. Testes de hipóteses são considerados e baseados nas estatísticas de wald, escore e razão de verossimilhanças. As estimativas de máxima verossimilhança do modelo de Grubbs em grupos são obtidas usando o algoritmo EM e considerando que as observações seguem uma distribuição normal. Apresentamos um estudo de análise de diagnóstico no modelo de Grubbs em grupos com o interesse de avaliar o impacto que um determinado subgrupo exerce na estimativa dos parâmetros. Vamos utilizar a metodologia de influência local proposta por Cook (1986), considerando o esquema de perturbação: ponderação de casos. Finalmente, apresentamos alguns estudos de simulação e ilustramos os resultados teóricos obtidos usando dados encontrados na literatura / Abstract: In this work, we presented a study of statistical inference in the Grubbs's model with subgroups, that represents an extension of the model proposed by Grubbs (1948,1973) that is frequently used to compare instruments or measurement methods. We considered the parametrization proposed by Bedrick (2001). The study is based on the maximum likelihood method. Tests of hypotheses are considered and based on the wald statistics, score and likelihood ratio statistics. The maximum likelihood estimators of the Grubbs's model with subgroups are obtained using the algorithm EM and considering that the observations follow a normal distribution. We also presented a study of diagnostic analysis in the Grubb's model with subgroups with the interest of evaluating the effect that a certain one subgroup exercises in the estimate of the parameters. We will use the methodology of local influence proposed by Cook (1986) considering the schemes of perturbation of case weights. Finally, we presented some simulation studies and we illustrated the obtained theoretical results using data found in the literature / Mestrado / Mestre em Estatística
25

Avaliação de transcritos diferencialmente expressos neoplasias humanas com ORESTES / Evaluation of differential expression profiles across neoplasic human samples using ORESTES (Opening reading frame)

Peres, Tarcisio de Souza 30 August 2006 (has links)
Orientador: Fernando Lopes Alberto / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Ciencias Medicas / Made available in DSpace on 2018-08-07T09:04:32Z (GMT). No. of bitstreams: 1 Peres_TarcisiodeSouza_M.pdf: 2330056 bytes, checksum: 66c1d9241ad60e2d973cfb7361a5eb7c (MD5) Previous issue date: 2006 / Resumo: Durante todo o século XX, a pesquisa do câncer se desenvolveu de maneira sistemática, porém os últimos 25 anos foram notadamente caracterizados por rápidos avanços que geraram uma rica e complexa base de conhecimentos, evidenciando a doença dentro de um conjunto dinâmico de alterações no genoma. Desta forma, o entendimento completo dos fenômenos moleculares envolvidos na fisiopatologia das neoplasias depende do conhecimento dos diversos processos celulares e bioquímicos característicos da célula tumoral e que, porventura, a diferenciem da célula normal (GOLUB e SLONIM, 1999). Nesse trabalho buscamos o melhor entendimento das vias moleculares no processo neoplásico por meio da análise dos dados do Projeto Genoma Humano do Câncer (CAMARGO, 2001) com vistas à identificação de genes diferencialmente expressos nas neoplasias dos seguintes tecidos: mama, cólon, cabeça e pescoço, pulmão, sistema nervoso central, próstata, estômago, testículo e útero. A metodologia de geração dos transcritos utilizada pelo Projeto Genoma Humano do Câncer é conhecida como ORESTES (DIAS et al, 2000). Inicialmente, os dados de seqüenciamento (fragmentos ORESTES) foram agrupados por meio de uma técnica conhecida em Bioinformática como ¿montagem¿, utilizando o pacote de programas de computador PHRED/PHRAP (EWING e GREEN P., 1998). A comparação de cada agrupamento com seqüências conhecidas (depositadas em bases públicas) foi realizada por meio do algoritmo BLAST (ALTSCHUL et al, 1990). Um subconjunto de genes foi selecionado com base em critérios específicos e submetido à avaliação de seus níveis de expressão em diferentes tecidos com base em abordagem de inferência Bayesiana (CHEN et al, 1998), em contraposição às abordagens mais clássicas, como testes de hipótese nula (AUDIC e CLAVERIE, 1997). A inferência Bayesiana foi viabilizada pelo desenvolvimento de uma ferramenta computacional escrita em linguagem PERL (PERES et al, 2005). Com o apoio da literatura, foi criada uma lista de genes relacionados ao fenômeno neoplásico. Esta lista foi confrontada com as informações de expressão gênica, constituindo-se em um dos parâmetros de um sistema de classificação (definido para a seleção dos genes de interesse). Desta forma, parte da base de conhecimento sobre câncer foi utilizada em conjunto com os dados de expressão gênica inferidos a partir dos fragmentos ORESTES. Para contextualização biológica da informação gerada, os genes foram classificados segundo nomenclatura GO (ASHBURNER et al, 2000) e KEGG (OGATA et al, 1999). Parte dos genes apontados como diferencialmente expressos em pelo menos um tecido tumoral, em relação ao seu equivalente normal, integram vias relacionadas ao fenômeno neoplásico (HAHN e WEINBERG, 2002). Dos genes associados a estas vias, 52% deles possuíam fator de expressão diferencial (em módulo) superior a cinco. Finalmente, dez entre os genes classificados foram escolhidos para confirmação experimental dos achados. Os resultados de qPCR em amostras de tecido gástrico normal e neoplásico foram compatíveis com com os dados de expressão gênica inferidos a partir dos fragmentos ORESTES / Abstract: The XXth century showed the development in cancer research in a systematic way, most notably in the last 25 years that were characterized by rapid advances that generated a rich and complex body of knowledge, highlighting the disease within a dynamic group of changes in the genome. The complete understanding of the molecular phenomena involved in the physiopathology of neoplasia is based upon the knowledge of the varied cellular and biochemical processes which are characteristic of the tumor and which make it different from the normal cell (GOLUB e SLONIM, 1999) In this work, we investigated the molecular pathways in the neoplasic process through data analyses of the cDNA sequences generated on the Human Cancer Genome Project (CAMARGO, 2001). The following neoplasias were included: breast, colon, head and neck, lungs, central nervous system, prostate gland, stomach, testicle and womb. The methodology of generation of transcripts used by the Genome Project of Human Cancer is known as ORESTES (DIAS et al, 2000). Initially, the sequence of data (ORESTES fragments) were grouped and assembled according to similarity scores. For this purpose, we used the package of computer programs PHRED/PHRAP (EWING e GREEN P., 1998). The resulting consensus sequences, each representing a cluster, were compared to known sequences (deposited in public databanks) through the BLAST algorithm (ALTSCHUL et al, 1990). A subgroup of genes was selected based on specific criteria and their levels of expression in different tissues were evaluated by a bayesian inference approach (CHEN et al, 1998), as compared to more classical approaches such as null hypothesis tests (AUDIC e CLAVERIE, 1997). The Bayesian inference tool was represented as a PERL script developed for this work. A list of genes, putatively related to the neoplasic phenotype, was created with the support of the literature. This list was compared to the gene expression information, becoming one of the parameters of a ranking system (defined for the selection of genes of interest). Therefore, part of the knowledge related to cancer was used together with the data of gene expression inferred from ORESTES fragments. For a more accurate understanding of the molecular pathways involved in the generated information, the genes were classified according to the Gene Ontology (ASHBURNER et al, 2000) and KEGG (OGATA et al, 1999) nomenclatures. Additional global analyses by pathways related to the neoplasic phenomenon (HAHN e WEINBERG, 2002) demonstrated differential expression of the selected genes. About 52% of the genes in this pathways were differentially expressed in tumor tissue with at least a 5-fold. Finally, ten genes were selected for experimental validation (in vitro) of the findings with real-time quantitative PCR, confirming in silico results / Mestrado / Ciencias Biomedicas / Mestre em Ciências Médicas
26

Método Bootstrap na agricultura de precisão / Bootstrap method in precision farming

Dalposso, Gustavo Henrique 15 February 2017 (has links)
Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2017-09-20T19:23:51Z No. of bitstreams: 1 Gustavo_Dalposso2017.pdf: 1367696 bytes, checksum: 564cf4753004f95da013e7b93a9767ac (MD5) / Made available in DSpace on 2017-09-20T19:23:51Z (GMT). No. of bitstreams: 1 Gustavo_Dalposso2017.pdf: 1367696 bytes, checksum: 564cf4753004f95da013e7b93a9767ac (MD5) Previous issue date: 2017-02-15 / Fundação Araucária de Apoio ao Desenvolvimento Científico e Tecnológico do Estado do Paraná (FA) / One issue in precision agriculture studies concerns about the statistical methods applied in inferential analysis, since they have required assumptions that, sometimes, cannot be assumed. A possibility to traditional methods is to use the bootstrap method, which consists in resampling and replacing the original data set to carry out inferences. The bootstrap methodology can be applied to independent sample data as well as in cases of dependence, such as in spatial statistics. However, adjustments are required during the resampling process in order to use the bootstrap method in spatial data. Thus, this trial aimed at applying the bootstrap method in precision agriculture studies, whose result was the preparation of three scientific papers. Soybean yield and soil attributes datasets formed with few samples were used in the first paper to determine a multiple linear regression model. Bootstrap methods were chosen to select variables, identify influential points and determine confidence intervals of the model parameters. The results showed that the bootstrap methods allowed selecting significant attributes to design a model, to build confidence intervals of the studied parameters and finally to indentify the influential points on the estimated parameters. Besides, spatial dependence of soybean yield data and soil attributes were studied in the second paper by bootstrap method in geostatistical analysis. The spatial bootstrap method was used to quantify the uncertainties associated with the spatial dependence structure, the fitted model parameter estimators, kriging predicted values and multivariate normality assumption of data. Thus, it was possible to quantify the uncertainties in all phases of geostatistical analysis. A spatial linear model was used to analyze soybean yield considering the soil attributes in the third paper. Spatial bootstrap methods were used to determine point and interval estimators associated with the studied model parameters. Hypothesis tests were carried out on the model parameters and probability plots were developed to identify data normality. These methods allowed to quantify the uncertainties associated to the structure of spatial dependence, as well as to evaluate the individual significance of the parameters associated with the average of the spatial linear model and to verify data multivariate normality assumption. Finally, it is concluded that bootstrap method is an effective alternative to make statistical inferences in precision agriculture studies. / Um problema que ocorre nos estudos vinculados à agricultura de precisão diz respeito aos métodos estatísticos utilizados nas análises inferenciais, pois eles requerem pressupostos que muitas vezes não podem ser assumidos. Uma alternativa aos métodos tradicionais é a utilização do método bootstrap, que utiliza reamostragens com reposição do conjunto de dados originais para realizar inferências. A metodologia bootstrap pode ser aplicada a dados amostrais independentes e também em casos de dependência, como na estatística espacial. No entanto, para se utilizar o método bootstrap em dados espaciais, são necessárias adaptações no processo de reamostragem. Este trabalho teve como objetivo utilizar o método bootstrap em estudos vinculados à agricultura de precisão, cujo resultado é a elaboração de três artigos. No primeiro artigo utilizou-se um conjunto de dados de produtividade de soja e atributos do solo formado com poucas amostras para determinar um modelo de regressão linear múltipla. Foram utilizados métodos bootstrap para a seleção de variáveis, identificação de pontos influentes e determinação de intervalos de confiança dos parâmetros do modelo. Os resultados mostraram que os métodos bootstrap permitiram selecionar os atributos que foram significativos na construção do modelo, construir os intervalos de confiança dos parâmetros e identificar os pontos que tiveram grande influência sobre os parâmetros estimados. No segundo artigo estudou-se a dependência espacial de dados de produtividade de soja e atributos do solo utilizando o método bootstrap na análise geoestatística. Utilizou-se o método bootstrap espacial para quantificar as incertezas associadas à caracterização das estruturas de dependência espacial, aos estimadores dos parâmetros dos modelos ajustados, aos valores preditos por krigagem e ao pressuposto de normalidade multivariada dos dados. Os resultados obtidos possibilitaram quantificar as incertezas em todas as fases da análise geoestatística. No terceiro artigo utilizou-se uma regressão espacial linear para modelar a produtividade de soja em função de atributos do solo. Foram utilizados métodos bootstrap espaciais para determinar estimadores pontuais e por intervalo associados aos parâmetros do modelo. Realizaram-se testes de hipóteses sobre os parâmetros do modelo e foram eleborados gráficos de probabilidade para identificar a normalidade dos dados. Os métodos permitiram quantificar as incertezas associadas à estrutura de dependência espacial, avaliar a significância individual dos parâmetros associados à média do modelo espacial linear e verificar a suposição de normalidade multivariada dos dados. Conclui-se, portanto, que o método bootstrap é uma eficaz alternativa para realizar inferências em estudos vinculados à agricultura de precisão.
27

Kvantilová regrese / Quantile Regression

Procházka, Jiří January 2015 (has links)
The thesis deals with brief introduction of the quantile regression theory. The thesis is divided into three thematic parts. In the first part the thesis deals with general introduction to the quantile regression, with theoretical aspects regarding quantile regression and with basic approaches to estimation of quantile regression parameters. The second part of the thesis focuses on general and asymptotic properties of the quantile regression. Goal of this part is to compare the quantile regression with traditional OLS regression and outline its possible application. In the third part the thesis describes statistical inference, construction of the confidence intervals and testing statistical hypotheses about quantile regression parameters. The goal of this part is to introduce traditional approach and the approach based on resampling procedures and in the end of the day perform mutual comparison of different approaches eventually propose partial modification.
28

Modelos lineares generalizados mistos multivariados para caracterização genética de doenças / Multivariate generalized linear mixed models for genetic characterization of diseases

Baldoni, Pedro Luiz, 1989- 24 August 2018 (has links)
Orientador: Hildete Prisco Pinheiro / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação / Made available in DSpace on 2018-08-24T09:34:36Z (GMT). No. of bitstreams: 1 Baldoni_PedroLuiz_M.pdf: 4328843 bytes, checksum: 0ab04f375988e62ac31097716ac0eaa5 (MD5) Previous issue date: 2014 / Resumo: Os Modelos Lineares Generalizados Mistos (MLGM) são uma generalização natural dos Modelos Lineares Mistos (MLM) e dos Modelos Lineares Generalizados (MLG). A classe dos MLGM estende a suposição de normalidade dos dados permitindo o uso de várias outras distribuições bem como acomoda a superdispersão frequentemente observada e também a correlação existente entre observações em estudos longitudiais ou com medidas repetidas. Entretanto, a teoria de verossimilhança para MLGM não é imediata uma vez que a função de verossimilhança marginal não possui forma fechada e envolve integrais de alta dimensão. Para solucionar este problema, diversas metodologias foram propostas na literatura, desde técnicas clássicas como quadraturas numéricas, por exemplo, até métodos sofisticados envolvendo algoritmo EM, métodos MCMC e quase-verossimilhança penalizada. Tais metodologias possuem vantagens e desvantagens que devem ser avaliadas em cada tipo de problema. Neste trabalho, o método de quase-verossimilhança penalizada (\cite{breslow1993approximate}) foi utilizado para modelar dados de ocorrência de doença em uma população de vacas leiteiras pois demonstrou ser robusto aos problemas encontrados na teoria de verossimilhança deste conjunto de dados. Além disto, os demais métodos não se mostram calculáveis frente à complexidade dos problemas existentes em genética quantitativa. Adicionalmente, estudos de simulação são apresentados para verificar a robustez de tal metodologia. A estabilidade dos estimadores e a teoria de robustez para este problema não estão completamente desenvolvidos na literatura / Abstract: Generalized Linear Mixed Models (GLMM) are a generalization of Linear Mixed Models (LMM) and of Generalized Linear Models (GLM). The class of models GLMM extends the normality assumption of the data and allows the use of several other probability distributions, for example, accommodating the over dispersion often observed and also the correlation among observations in longitudinal or repeated measures studies. However, the likelihood theory of the GLMM class is not straightforward since its likelihood function has not closed form and involves a high order dimensional integral. In order to solve this problem, several methodologies were proposed in the literature, from classical techniques as numerical quadrature¿s, for example, up to sophisticated methods involving EM algorithm, MCMC methods and penalized quasi-likelihood. These methods have advantages and disadvantages that must be evaluated in each problem. In this work, the penalized quasi-likelihood method (\cite{breslow1993approximate}) was used to model infection data in a population of dairy cattle because demonstrated to be robust in the problems faced in the likelihood theory of this data. Moreover, the other methods do not show to be treatable faced to the complexity existing in quantitative genetics. Additionally, simulation studies are presented in order to verify the robustness of this methodology. The stability of these estimators and the robust theory of this problem are not completely studied in the literature / Mestrado / Estatistica / Mestre em Estatística
29

Simulation and Statistical Inference of Stochastic Reaction Networks with Applications to Epidemic Models

Moraes, Alvaro 01 1900 (has links)
Epidemics have shaped, sometimes more than wars and natural disasters, demo- graphic aspects of human populations around the world, their health habits and their economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and current examples of potential hazards at planetary scale. During the spread of an epidemic disease, there are phenomena, like the sudden extinction of the epidemic, that can not be captured by deterministic models. As a consequence, stochastic models have been proposed during the last decades. A typical forward problem in the stochastic setting could be the approximation of the expected number of infected individuals found in one month from now. On the other hand, a typical inverse problem could be, given a discretely observed set of epidemiological data, infer the transmission rate of the epidemic or its basic reproduction number. Markovian epidemic models are stochastic models belonging to a wide class of pure jump processes known as Stochastic Reaction Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions but they also have applications in neural networks, virus kinetics, and dynamics of social networks, among others. 4 This PhD thesis is focused on novel fast simulation algorithms and statistical inference methods for SRNs. Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide accurate estimates of expected values of a given observable of SRNs at a prescribed final time. They are designed to control the global approximation error up to a user-selected accuracy and up to a certain confidence level, and with near optimal computational work. We also present novel dual-weighted residual expansions for fast estimation of weak and strong errors arising from the MLMC methodology. Regarding the statistical inference aspect, we first mention an innovative multi- scale approach, where we introduce a deterministic systematic way of using up-scaled likelihoods for parameter estimation while the statistical fittings are done in the base model through the use of the Master Equation. In a di↵erent approach, we derive a new forward-reverse representation for simulating stochastic bridges between con- secutive observations. This allows us to use the well-known EM Algorithm to infer the reaction rates. The forward-reverse methodology is boosted by an initial phase where, using multi-scale approximation techniques, we provide initial values for the EM Algorithm.
30

Aspects of Modern Queueing Theory

Ruixin Wang (12873017) 15 June 2022 (has links)
<p>Queueing systems are everywhere: in transportation networks, service centers, communication systems, clinics, manufacturing systems, etc. In this dissertation, we contribute to the theory of queueing in two aspects. In the first part, we dilate the interplay between retrials and strategic arrival behavior in single-class queueing networks. Specifically, we study a variation of the ‘Network Concert Queueing Game,’ wherein a fixed but large number of strategic users arrive at a network of queues where they can be routed to other queues in the network following a fixed routing matrix, or potentially fedback to the end of the queue they arrive at. Working in a non-atomic setting, we prove the existence of Nash equilibrium arrival and routing profiles in three simple, but non-trivial, network topologies/architectures. In two of them, we also prove the uniqueness of the equilibrium. Our results prove that Nash equilibrium decisions on when to arrive and which queue to join in a network are substantially impacted by routing, inducing ‘herding’ behavior under certain conditions on the network architecture. Our theory raises important design implications for capacity-sharing in systems with strategic users, such as ride-sharing and crowdsourcing platforms.</p> <p><br></p> <p>In the second part, we develop a new method of data-driven model calibration or estimation for queueing models. Statistical and theoretical analyses of traffic traces show that the doubly stochastic Poisson processes are appropriate models of high intensity traffic arriving at an array of service systems. On the other hand, the statistical estimation of the underlying latent stochastic intensity process driving the traffic model involves a rather complicated nonlinear filtering problem. In this thesis we use deep neural networks to ‘parameterize’ the path measures induced by the stochastic intensity process, and solve this nonlinear filtering problem by maximizing a tight surrogate objective called the evidence lower bound (ELBO). This framework is flexible in the sense that we can also estimate other stochastic processes (e.g., the queue length process) and their related parameters (e.g., the service time distribution). We demonstrate the effectiveness of our results through extensive simulations. We also provide approximation guarantees for the estimation/calibration problem. Working with the Markov chain induced by the Euler-Maruyama discretization of the latent diffusion, we show that (1) there exists a sequence of approximate data generating distributions that converges to the “ground truth” distribution in total variation distance; (2) the variational gap is strictly positive for the optimal solution to the ELBO. Extending to the non-Markov setting, we identify the variational gap minimizing approximate posterior for an arbitrary (known) posterior and further, prove a lower bound on the optimal ELBO. Recent theoretical results on optimizing the ELBO for related (but ultimately different) models show that when the data generating distribution equals the ground truth distribution and the variational gap is zero, the probability measures that achieve these conditions also maximize the ELBO. Our results show that this may not be true in all problem settings.</p>

Page generated in 0.5082 seconds