Spelling suggestions: "subject:"istatistical inference"" "subject:"istatistical lnference""
31 |
Simulation and Statistical Inference of Stochastic Reaction Networks with Applications to Epidemic ModelsMoraes, Alvaro 01 1900 (has links)
Epidemics have shaped, sometimes more than wars and natural disasters, demo- graphic aspects of human populations around the world, their health habits and their economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and current examples of potential hazards at planetary scale.
During the spread of an epidemic disease, there are phenomena, like the sudden extinction of the epidemic, that can not be captured by deterministic models. As a consequence, stochastic models have been proposed during the last decades. A typical forward problem in the stochastic setting could be the approximation of the expected number of infected individuals found in one month from now. On the other hand, a typical inverse problem could be, given a discretely observed set of epidemiological data, infer the transmission rate of the epidemic or its basic reproduction number.
Markovian epidemic models are stochastic models belonging to a wide class of pure jump processes known as Stochastic Reaction Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions but they also have applications in neural networks, virus kinetics, and dynamics of social networks, among others.
4
This PhD thesis is focused on novel fast simulation algorithms and statistical
inference methods for SRNs.
Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide
accurate estimates of expected values of a given observable of SRNs at a prescribed final time. They are designed to control the global approximation error up to a user-selected accuracy and up to a certain confidence level, and with near optimal computational work. We also present novel dual-weighted residual expansions for fast estimation of weak and strong errors arising from the MLMC methodology.
Regarding the statistical inference aspect, we first mention an innovative multi- scale approach, where we introduce a deterministic systematic way of using up-scaled likelihoods for parameter estimation while the statistical fittings are done in the base model through the use of the Master Equation. In a di↵erent approach, we derive a new forward-reverse representation for simulating stochastic bridges between con- secutive observations. This allows us to use the well-known EM Algorithm to infer the reaction rates. The forward-reverse methodology is boosted by an initial phase where, using multi-scale approximation techniques, we provide initial values for the EM Algorithm.
|
32 |
Aspects of Modern Queueing TheoryRuixin Wang (12873017) 15 June 2022 (has links)
<p>Queueing systems are everywhere: in transportation networks, service centers, communication systems, clinics, manufacturing systems, etc. In this dissertation, we contribute to the theory of queueing in two aspects. In the first part, we dilate the interplay between retrials and strategic arrival behavior in single-class queueing networks. Specifically, we study a variation of the ‘Network Concert Queueing Game,’ wherein a fixed but large number of strategic users arrive at a network of queues where they can be routed to other queues in the network following a fixed routing matrix, or potentially fedback to the end of the queue they arrive at. Working in a non-atomic setting, we prove the existence of Nash equilibrium arrival and routing profiles in three simple, but non-trivial, network topologies/architectures. In two of them, we also prove the uniqueness of the equilibrium. Our results prove that Nash equilibrium decisions on when to arrive and which queue to join in a network are substantially impacted by routing, inducing ‘herding’ behavior under certain conditions on the network architecture. Our theory raises important design implications for capacity-sharing in systems with strategic users, such as ride-sharing and crowdsourcing platforms.</p>
<p><br></p>
<p>In the second part, we develop a new method of data-driven model calibration or estimation for queueing models. Statistical and theoretical analyses of traffic traces show that the doubly stochastic Poisson processes are appropriate models of high intensity traffic arriving at an array of service systems. On the other hand, the statistical estimation of the underlying latent stochastic intensity process driving the traffic model involves a rather complicated nonlinear filtering problem. In this thesis we use deep neural networks to ‘parameterize’ the path measures induced by the stochastic intensity process, and solve this nonlinear filtering problem by maximizing a tight surrogate objective called the evidence lower bound (ELBO). This framework is flexible in the sense that we can also estimate other stochastic processes (e.g., the queue length process) and their related parameters (e.g., the service time distribution). We demonstrate the effectiveness of our results through extensive simulations. We also provide approximation guarantees for the estimation/calibration problem. Working with the Markov chain induced by the Euler-Maruyama discretization of the latent diffusion, we show that (1) there exists a sequence of approximate data generating distributions that converges to the “ground truth” distribution in total variation distance; (2) the variational gap is strictly positive for the optimal solution to the ELBO. Extending to the non-Markov setting, we identify the variational gap minimizing approximate posterior for an arbitrary (known) posterior and further, prove a lower bound on the optimal ELBO. Recent theoretical results on optimizing the ELBO for related (but ultimately different) models show that when the data generating distribution equals the ground truth distribution and the variational gap is zero, the probability measures that achieve these conditions also maximize the ELBO. Our results show that this may not be true in all problem settings.</p>
|
33 |
Statistical Inference for r-out-of-n F-system Based on Birnbaum-Saunders DistributionZhou, Yiliang January 2017 (has links)
The r-out-of-n F-system and load-sharing system are very common in industrial engineering. Statistical inference has been developed here for an equal-load sharing r-out-of-n F-system on Birnbaum-Sauders (BS) lifetime distribution. A simulation study is carried out with different parameter values and different censoring rates in order to examine the performance of the proposed estimation method. Moreover, to find maximum likelihood estimates numerically, three methods of finding initial values for the parameters - pseudo complete sample method, Type-II modified moment estimators of BS distribution method and stochastic approximation method - are developed. These three methods are then compared based on the number of iterations and simulation time. Two real data sets and one simulated data set are used for illustrative purposes. Finally, some concluding comments are made including possible
future directions for investigation. / Thesis / Master of Science (MSc)
|
34 |
New Methods of Variable Selection and Inference on High Dimensional DataRen, Sheng January 2017 (has links)
No description available.
|
35 |
Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic DataZhong, Jianling January 2015 (has links)
<p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation
|
36 |
Model-Based Population Genetics in Indigenous Humans: Inferences of Demographic History, Adaptive Selection, and African Archaic Admixture using Whole-Genome/Exome Sequencing DataHsieh, PingHsun January 2016 (has links)
Reconstructing the origins and evolutionary journey of humans is a central piece of biology. Complementary to archeology, population genetics studying genetic variation among individuals in extant populations has made considerable progress in understanding the evolution of our species. Particularly, studies in indigenous humans provide valuable insights on the prehistory of humans because their life history closely resembles that of our ancestors. Despite these efforts, it can be difficult to disentangle population genetic inferences because of the interplay among evolutionary forces, including mutation, recombination, selection, and demographic processes. To date, few studies have adopted a comprehensive framework to jointly account for these confounding effects. The shortage of such an approach inspired this dissertation work, which centered on the development of model-based analysis and demonstrated its importance in population genetic inferences. Indigenous African Pygmy hunter-gatherers have been long studied because of interest in their short stature, foraging subsistence strategy in rainforests, and long-term socio-economic relationship with nearby farmers. I proposed detailed demographic models using genomes from seven Western African Pygmies and nine Western African farmers (Appendix A). Statistical evidence was shown for a much deeper divergence than previously thought and for asymmetric migrations with a larger contribution from the farmers to Pygmies. The model-based analyses revealed significant adaption signals in the Pygmies for genes involved in muscle development, bone synthesis, immunity, reproduction, etc. I also showed that the proposed model-based approach is robust to the confounding effects of evolutionary forces (Appendix A). Contrary to the low-latitude African homeland of humans, the indigenous Siberians are long-term survivors inhabiting one of the coldest places on Earth. Leveraging whole exome sequencing data from two Siberian populations, I presented demographic models for these North Asian dwellers that include divergence, isolation, and gene flow (Appendix B). The best-fit models suggested a closer genetic affinity of these Siberians to East Asians than to Europeans. Using the model-based framework, seven NCBI BioSystems gene sets showed significance for polygenic selection in these Siberians. Interestingly, many of these candidate gene sets are heavily related to diet, indicating possible adaptations to special dietary requirements in these populations in cold, resource-limited environments. Finally, I moved beyond studying the history of extant humans to explore the origins of our species in Africa (Appendix C). Specifically, with statistical analyses using genomes only from extant Africans, I rejected the null model of no archaic admixture in Africa and in turn gave the first whole-genome evidence for interbreeding among human species in Africa. Using extensive simulation analyses under various archaic admixture models, the results suggest recurrent admixture between the ancestors of archaic and modern Africans, with evidence that at least one such event occurred in the last 30,000 years in Africa.
|
37 |
Métodos alternativos para realização de testes de hipóteses em delineamentos experimentais. / Alternative methods for testing hypotheses in experimental designs.Nesi, Cristiano Nunes 17 July 2002 (has links)
Na estatística experimental, especificamente quando se faz análise de variância, os testes de hipóteses têm sido amplamente utilizados para se concluir a respeito das fontes de variação consideradas nos modelos lineares. Para tanto, é comum a utilização de sistemas estatísticos que fornecem análises de variância e a estatística F, entre outras, para a tomada de decisões. Entretanto, o teste F numa análise de variância para tratamentos com mais de um grau de liberdade proporciona informações gerais, relacionadas com o comportamento médio dos tratamentos. Por essa razão, deve-se planejar comparações objetivas, fazendo-se desdobramentos dos graus de liberdade de tratamentos para obter informações mais específicas. Nesse sentido, uma técnica usada para esses desdobramentos baseia-se na utilização de contrastes, sendo necessário que cada componente seja explicado por um contraste, com todos os contrastes sendo ortogonais entre si, para que as comparações sejam independentes. Entretanto, essa técnica torna-se complexa à medida que o número de tratamentos aumenta. Frente a isso, utilizando-se os dados provenientes de um experimento de competição entre dois grupos de variedades de cana-de-açúcar, inteiramente ao acaso com seis tratamentos e cinco repetições, e também nos dados obtidos de um experimento fictício de competição entre híbridos de milho no delineamento blocos casualizados, propôs-se uma técnica, empregando variáveis auxiliares, para facilitar o desdobramento ortogonal dos graus de liberdade de tratamentos, procurando-se evidenciar que essa técnica facilita o desdobramento ortogonal dos graus de liberdade de tratamentos e tem resultados equivalentes aos obtidos utilizando-se a função CONTRAST do PROC GLM do SAS. Outro problema refere-se à análise de experimentos fatoriais com desbalanceamento das amostras, tendo em vista que as técnicas de estimação de parcelas perdidas não resolvem satisfatoriamente o problema, principalmente se existem muitas parcelas perdidas. Quando os dados são desbalanceados, há necessidade de se conhecer que hipóteses estão sendo testadas e se estas são de interesse do pesquisador, devido à complexidade dessas hipóteses, principalmente em presença de caselas vazias. Além disso, muito têm sido escrito sobre os diferentes resultados da análise de variância apresentados por sistemas estatísticos para dados desbalanceados com caselas vazias, o que tem gerado confusão entre os pesquisadores. Com a finalidade de propor um método alternativo para a obtenção de hipóteses de interesse, utilizaram-se os resultados de um experimento fatorial 2x3, inteiramente ao acaso, com quatro repetições, para testar os efeitos de três reguladores de crescimento (hormônios), sobre a propagação "in vitro" de dois porta-enxertos (cultivares) de macieira. Assim, diante do fato que testar uma hipótese é equivalente a impor uma restrição estimável aos parâmetros do modelo, utilizaram-se restrições paramétricas estimáveis como um critério alternativo para realizar testes de hipóteses de interesse em modelos lineares com dados desbalanceados. Os resultados mostram que esse método permite que o pesquisador teste diretamente hipóteses de seu interesse, com resultados equivalentes aos encontrados com a função CONTRAST do PROC GLM do SAS. / For experimental designs, it is usually necessary to do tests of hypotheses to conclude about effects considered in the linear models. In these cases, it is common to use statistical softwares that supply the analyses of variance and F statistics, among others, for taking decisions. However, the test F in an analysis of variance for sources of variation with more than a degree of freedom provides general information, about significant differences of levels of the factor. Therefore, it should be planned objective comparisons, making orthogonal decompositions of the degrees of the effects of interest to get more specific information. One technique used frequently based on the orthogonal contrasts, so that the comparisons are independent. However, this technique becomes complex as the number of levels of the factor increases. To study alternative methods to do these comparisons, we use data from a yield trail experiment considering two groups of varieties of sugarcane, in a complete randomized design with 6 treatments and 5 repetitions. Also, we use data from a fictitious experiment comparing hybrids of maize in the randomized complete block design. The technique of analysis using dummy variables to facilitate the orthogonal decomposition of degrees of freedom of treatments was proposed. This technique facilitates the orthogonal decomposition and has the same results of those obtained the function CONTRAST of PROC GLM of SAS. Another situation considered involves experiments with unbalanced data. In this case, it is possible to suppose that there is the necessity of knowing what hypotheses are being tested and if they are useful. Much has been written on the different results of analysis of variance presented by statistical software for unbalanced data. This can create confusion to the researcher. To illustrate, we used the results of an 2x3 factorial experiment with 4 replicates, to test the effect of 3 hormones, on the propagation of 2 in vitro cultivars of apple trees. Thus, considering that to test a hypotheses is equivalent to impose an estimable restriction to the parameters of the model, we use these restrictions as an alternative criteria to directly carry out tests of hypotheses in linear models with unbalanced data. The results showed that this procedure is equivalent of that used by the function CONTRAST of PROC GLM/SAS.
|
38 |
Inferência estatística sobre a qualidade do processo de compras públicas diretas em uma Instituição de ensinoAraújo, Larissa Barreto de 23 July 2013 (has links)
Made available in DSpace on 2015-04-23T12:43:05Z (GMT). No. of bitstreams: 1
larissa.pdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2013-07-23 / Alavancado pelo processo da globalização, a qualidade tornou-se função decisiva para a conquista de clientes e competitividade. Assim, as organizações estão evoluindo para
o aperfeiçoamento de seus métodos de gestão, com intuito de atender aos requisitos exigidos pelos clientes. Dentro do contexto da gestão da qualidade, um dos métodos
mais difundidos é o Ciclo PDCA, que pode ser desdobrado dentro de cada processo da organização e para o sistema de processos em sua totalidade. No setor de serviços, a
percepção do cliente é indispensável para alteração de diretriz de controle (melhorias). Para isto, a estatística pode ser utilizada para viabilizar a coleta, o processamento e a
disposição das informações, de forma que o conhecimento gerado é utilizado, por meio do Ciclo PDCA, para atingir metas de melhoria. Neste sentido, o trabalho objetiva propor um modelo de inferência estatística sobre a qualidade do processo de compra pública direta em Instituição de Ensino. Para isso, foi analisada a rotina do setor de compras, na qual se fez o uso das ferramentas da qualidade e dos testes não
paramétricos para fazer comparações entre grupos de fatores. Com isso, foi possível identificar os fatores que influenciam de forma significativa o aumento do tempo dos processos. A abordagem da pesquisa é qualitativa, principalmente na coleta e análise dos dados, e quantitativa no tratamento. Quanto aos fins, a natureza da pesquisa é exploratória e descritiva e quanto aos meios de investigação, um estudo de caso. Com
base no marco teórico, foi possível elaborar um modelo de procedimentos para execução de uma experimentação que envolve o Ciclo PDCA, as ferramentas da qualidade e os testes não paramétricos para inferência estatística sobre a qualidade do processo. A partir da aplicação deste modelo, identificou-se que o fator valor alto é o mais significativo no processo de compra pública direta. Fora este fator, a natureza do
processo classificada como material permanente e processos que tramitam por mais de dezesseis setores também são fatores que influenciam no aumento do tempo. Dessa
forma, foi possível contribuir com a área de gestão da qualidade, visto que se delineou a respeito da inferência estatística na qualidade em processos de compras, um assunto
pouco difundido no âmbito do setor público.
|
39 |
Statistical inference in population genetics using microsatellitesCsilléry, Katalin January 2009 (has links)
Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data.
|
40 |
Application of random matrix theory to future wireless flexible networks.Couillet, Romain 12 November 2010 (has links) (PDF)
Future cognitive radio networks are expected to come as a disruptive technological advance in the currently saturated field of wireless communications. The idea behind cognitive radios is to think of the wireless channels as a pool of communication resources, which can be accessed on-demand by a primary licensed network or opportunistically preempted (or overlaid) by a secondary network with lower access priority. From a physical layer point of view, the primary network is ideally oblivious of the existence of a co-localized secondary networks. The latter are therefore required to autonomously explore the air in search for resource left-overs, and then to optimally exploit the available resource. The exploration and exploitation procedures, which involve multiple interacting agents, are requested to be highly reliable, fast and efficient. The objective of the thesis is to model, analyse and propose computationally efficient and close-to-optimal solutions to the above operations.Regarding the exploration phase, we first resort to the maximum entropy principle to derive communication models with many unknowns, from which we derive the optimal multi-source multi-sensor Neyman-Pearson signal sensing procedure. The latter allows for a secondary network to detect the presence of spectral left-overs. The computational complexity of the optimal approach however calls for simpler techniques, which are recollected and discussed. We then proceed to the extension of the signal sensing approach to the more advanced blind user localization, which provides further valuable information to overlay occupied spectral resources.The second part of the thesis is dedicaded to the exploitation phase, that is, the optimal sharing of available resources. To this end, we derive an (asymptotically accurate) approximated expression for the uplink ergodic sum rate of a multi-antenna multiple-access channel and propose solutions for cognitive radios to adapt rapidly to the evolution of the primary network at a minimum feedback cost for the secondary networks.
|
Page generated in 0.0811 seconds