Global ETD Search

21	Assessment of Penalized Regression for Genome-wide Association Studies Yi, Hui 27 August 2014 (has links) The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single marker association methods. As an alternative to Single Marker Analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of Penalized Regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by False Discovery Rate (FDR) control, and assess their performance (including penalties incorporating linkage disequilibrium) in comparison with SMA. PR methods were compared with SMA on realistically simulated GWAS data consisting of genotype data from single and multiple chromosomes and a continuous phenotype and on real data. Based on our comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control. PR controlled the FDR conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on variable selection with FDR control. Incorporating LD into PR by adapting penalties developed for covariates measured on graphs can improve power but also generate morel false positives or wider regions for follow-up. We recommend using the Elastic Net with a mixing weight for the Lasso penalty near 0.5 as the best method. / Ph. D. Genome-wide Association Study penalized regression false discovery rate linkage disequilibrium
22	Bayesian Methods Under Unknown Prior Distributions with Applications to The Analysis of Gene Expression Data Rahal, Abbas 14 July 2021 (has links) The local false discovery rate (LFDR) is one of many existing statistical methods that analyze multiple hypothesis testing. As a Bayesian quantity, the LFDR is based on the prior probability of the null hypothesis and a mixture distribution of null and non-null hypothesis. In practice, the LFDR is unknown and needs to be estimated. The empirical Bayes approach can be used to estimate that mixture distribution. Empirical Bayes does not require complete information about the prior and hyper prior distributions as in hierarchical Bayes. When we do not have enough information at the prior level, and instead of placing a distribution at the hyper prior level in the hierarchical Bayes model, empirical Bayes estimates the prior parameters using the data via, often, the marginal distribution. In this research, we developed new Bayesian methods under unknown prior distribution. A set of adequate prior distributions maybe defined using Bayesian model checking by setting a threshold on the posterior predictive p-value, prior predictive p-value, calibrated p-value, Bayes factor, or integrated likelihood. We derive a set of adequate posterior distributions from that set. In order to obtain a single posterior distribution instead of a set of adequate posterior distributions, we used a blended distribution, which minimizes the relative entropy of a set of adequate prior (or posterior) distributions to a "benchmark" prior (or posterior) distribution. We present two approaches to generate a blended posterior distribution, namely, updating-before-blending and blending-before-updating. The blended posterior distribution can be used to estimate the LFDR by considering the nonlocal false discovery rate as a benchmark and the different LFDR estimators as an adequate set. The likelihood ratio can often be misleading in multiple testing, unless it is supplemented by adjusted p-values or posterior probabilities based on sufficiently strong prior distributions. In case of unknown prior distributions, they can be estimated by empirical Bayes methods or blended distributions. We propose a general framework for applying the laws of likelihood to problems involving multiple hypotheses by bringing together multiple statistical models. We have applied the proposed framework to data sets from genomics, COVID-19 and other data. Robust Bayesian statistics Imprecise probability Bayesian model checking Blended inference Posterior predictive p-value Local false discovery rate Empirical Bayes Multiple testing Bayesian false discovery rate Measure of evidence Direct likelihood inference Likelihoodism
23	Multiple Testing Correction with Repeated Correlated Outcomes: Applications to Epigenetics Leap, Katie 27 October 2017 (has links) Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time. We found that testing each site with a linear mixed effects model and then controlling the false discovery rate (FDR) had the highest positive predictive value (PPV), a low number of false positives, and was able to differentiate between differential methylation that was present at only one time point vs. a persistent relationship. In contrast, methods that controlled FDR at a single time point and ad hoc methods tended to have lower PPV, more false positives, and/or were unable to differentiate these conditions. Validation in data obtained from Project Viva found a difference between fitting longitudinal models only to sites significant at one time point and fitting all sites longitudinally. multiple testing epigenetics methylation mixed models longitudinal false discovery rate Bioinformatics Biostatistics Computational Biology
24	Variable Selection for High-Dimensional Data with Error Control Fu, Han 23 September 2022 (has links) No description available. Biostatistics Public Health Statistics Genetics Variable selection false discovery rate ordinal regression survival analysis cure fraction knockoff filter
25	Enhancing Pavement Surface Macrotexture Characterization Mogrovejo Carrasco, Daniel Estuardo 30 April 2015 (has links) One of the most important objectives for transportation engineers is to understand pavement surface properties and their positive and negative effects on the user. This can improve the design of the infrastructure, adequacy of tools, and consistency of methodologies that are essential for transportation practitioners regarding macrotexture characterization. Important pavement surface characteristics, or tire-pavement interactions, such as friction, tire-pavement noise, splash and spray, and rolling resistance, are significantly influenced by pavement macrotexture. This dissertation compares static and dynamic macrotexture measurements and proposes and enhanced method to quantify the macrotexture. Dynamic measurements performed with vehicle-mounted lasers have the advantage of measuring macrotexture at traffic speed. One drawback of these laser devices is the presence of 'spikes' in the collected data, which impact the texture measurements. The dissertation proposes two robust and innovative methods to overcome this limitation. The first method is a data-driven adaptive method that detects and removes the spikes from high-speed laser texture measurements. The method first calculates the discrete wavelet transform of the texture measurements. It then detects (at all levels) and removes the spikes from the obtained wavelet coefficients (or differences). Finally, it calculates the inverse discrete wavelet transform with the processed wavelet coefficients (without outliers) to obtain the Mean Profile Depth (MPD) from the measurements with the spikes removed. The method was validated by comparing the results with MPD measurements obtained with a Circular Texture Meter (CTMeter) that was chosen as the control device. Although this first method was able to successfully remove the spikes, it has the drawback that it depends on manual modeling of the distribution of the wavelet coefficients to correctly define an appropriate threshold. The next step of this dissertation proposes an enhanced to the spike removal methodology for macrotexture measurements taken with high-speed laser devices. This denoising methodology uses an algorithm that defines the distribution of texture measurements by using the family of Generalized Gaussian Distributions (GGD), along with the False Discovery Rate (FDR) method that controls the proportion of wrongly identified spikes among all identified spikes. The FDR control allows for an adaptive threshold selection that differentiates between valid measurements and spikes. The validation of the method showed that the MPD results obtained with denoised dynamic measurements are comparable to MPD results from the control devices. This second method is included as a crucial step in the last stage of this dissertation as explained following. The last part of the dissertation presents an enhanced macrotexture characterization index based on the Effective Area for Water Evacuation (EAWE), which: (1) Estimates the potential of the pavement to drain water and (2) Correlates better with two pavement surface properties affected by macrotexture (friction and noise) that the current MPD method. The proposed index is defined by a three-step process that: (1) removes the spikes, assuring the reliability of the texture profile data, (2) finds the enveloping profile that is necessary to delimit the area between the tire and the pavement when contact occurs, and (3) computes the EAWE. Comparisons of current (MPD) and proposed (EAWE) macrotexture characterization indices showed that the MPD overestimates the ability of the pavement for draining the surface water under a tire. / Ph. D. Pavement Surface Properties Macrotexture Characterization High Speed Laser Device Spike removal False Discovery Rate Effective Area for Water Evacuation.
26	Modèles de mélange semi-paramétriques et applications aux tests multiples / Semi-parametric mixture models and applications to multiple testing Nguyen, Van Hanh 01 October 2013 (has links) Dans un contexte de test multiple, nous considérons un modèle de mélange semi-paramétrique avec deux composantes. Une composante est supposée connue et correspond à la distribution des p-valeurs sous hypothèse nulle avec probabilité a priori p. L'autre composante f est nonparamétrique et représente la distribution des p-valeurs sous l'hypothèse alternative. Le problème d'estimer les paramètres p et f du modèle apparaît dans les procédures de contrôle du taux de faux positifs (``false discovery rate'' ou FDR). Dans la première partie de cette dissertation, nous étudions l'estimation de la proportion p. Nous discutons de résultats d'efficacité asymptotique et établissons que deux cas différents arrivent suivant que f s'annule ou non surtout un intervalle non-vide. Dans le premier cas (annulation surtout un intervalle), nous présentons des estimateurs qui convergent \`{a} la vitesse paramétrique, calculons la variance asymptotique optimale et conjecturons qu'aucun estimateur n'est asymptotiquement efficace (i.e atteint la variance asymptotique optimale). Dans le deuxième cas, nous prouvons que le risque quadratique de n'importe quel estimateur ne converge pas à la vitesse paramétrique. Dans la deuxième partie de la dissertation, nous nous concentrons sur l'estimation de la composante inconnue nonparamétrique f dans le mélange, en comptant sur un estimateur préliminaire de p. Nous proposons et étudions les propriétés asymptotiques de deux estimateurs différents pour cette composante inconnue. Le premier estimateur est un estimateur à noyau avec poids aléatoires. Nous établissons une borne supérieure pour son risque quadratique ponctuel, en montrant une vitesse de convergence nonparamétrique classique sur une classe de Holder. Le deuxième estimateur est un estimateur du maximum de vraisemblance régularisée. Il est calculé par un algorithme itératif, pour lequel nous établissons une propriété de décroissance d'un critère. De plus, ces estimateurs sont utilisés dans une procédure de test multiple pour estimer le taux local de faux positifs (``local false discovery rate'' ou lfdr). / In a multiple testing context, we consider a semiparametric mixture model with two components. One component is assumed to be known and corresponds to the distribution of p-values under the null hypothesis with prior probability p. The other component f is nonparametric and stands for the distribution under the alternative hypothesis. The problem of estimating the parameters p and f of the model appears from the false discovery rate control procedures. In the first part of this dissertation, we study the estimation of the proportion p. We discuss asymptotic efficiency results and establish that two different cases occur whether f vanishes on a non-empty interval or not. In the first case, we exhibit estimators converging at parametric rate, compute the optimal asymptotic variance and conjecture that no estimator is asymptotically efficient (i.e. attains the optimal asymptotic variance). In the second case, we prove that the quadratic risk of any estimator does not converge at parametric rate. In the second part of the dissertation, we focus on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of p. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Holder densities. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Modèles de mélange Semi-paramétrique Tests multiple Semi-paramétrique Estimateurs à noyau Estimateurs par histogramme Mixture models Semi-parametric Multiple testing False discovery rate Kernel estimators Histogram based estimators
27	Multiple hypothesis testing and multiple outlier identification methods Yin, Yaling 13 April 2010 Traditional multiple hypothesis testing procedures, such as that of Benjamini and Hochberg, fix an error rate and determine the corresponding rejection region. In 2002 Storey proposed a fixed rejection region procedure and showed numerically that it can gain more power than the fixed error rate procedure of Benjamini and Hochberg while controlling the same false discovery rate (FDR). In this thesis it is proved that when the number of alternatives is small compared to the total number of hypotheses, Storeys method can be less powerful than that of Benjamini and Hochberg. Moreover, the two procedures are compared by setting them to produce the same FDR. The difference in power between Storeys procedure and that of Benjamini and Hochberg is near zero when the distance between the null and alternative distributions is large, but Benjamini and Hochbergs procedure becomes more powerful as the distance decreases. It is shown that modifying the Benjamini and Hochberg procedure to incorporate an estimate of the proportion of true null hypotheses as proposed by Black gives a procedure with superior power.<p> Multiple hypothesis testing can also be applied to regression diagnostics. In this thesis, a Bayesian method is proposed to test multiple hypotheses, of which the i-th null and alternative hypotheses are that the i-th observation is not an outlier versus it is, for i=1,...,m. In the proposed Bayesian model, it is assumed that outliers have a mean shift, where the proportion of outliers and the mean shift respectively follow a Beta prior distribution and a normal prior distribution. It is proved in the thesis that for the proposed model, when there exists more than one outlier, the marginal distributions of the deletion residual of the i-th observation under both null and alternative hypotheses are doubly noncentral t distributions. The outlyingness of the i-th observation is measured by the marginal posterior probability that the i-th observation is an outlier given its deletion residual. An importance sampling method is proposed to calculate this probability. This method requires the computation of the density of the doubly noncentral F distribution and this is approximated using Patnaiks approximation. An algorithm is proposed in this thesis to examine the accuracy of Patnaiks approximation. The comparison of this algorithms output with Patnaiks approximation shows that the latter can save massive computation time without losing much accuracy.<p> The proposed Bayesian multiple outlier identification procedure is applied to some simulated data sets. Various simulation and prior parameters are used to study the sensitivity of the posteriors to the priors. The area under the ROC curves (AUC) is calculated for each combination of parameters. A factorial design analysis on AUC is carried out by choosing various simulation and prior parameters as factors. The resulting AUC values are high for various selected parameters, indicating that the proposed method can identify the majority of outliers within tolerable errors. The results of the factorial design show that the priors do not have much effect on the marginal posterior probability as long as the sample size is not too small.<p> In this thesis, the proposed Bayesian procedure is also applied to a real data set obtained by Kanduc et al. in 2008. The proteomes of thirty viruses examined by Kanduc et al. are found to share a high number of pentapeptide overlaps to the human proteome. In a linear regression analysis of the level of viral overlaps to the human proteome and the length of viral proteome, it is reported by Kanduc et al. that among the thirty viruses, human T-lymphotropic virus 1, Rubella virus, and hepatitis C virus, present relatively higher levels of overlaps with the human proteome than the predicted level of overlaps. The results obtained using the proposed procedure indicate that the four viruses with extremely large sizes (Human herpesvirus 4, Human herpesvirus 6, Variola virus, and Human herpesvirus 5) are more likely to be the outliers than the three reported viruses. The results with thefour extreme viruses deleted confirm the claim of Kanduc et al. mean shift noncentrality parameter area under ROC curve receiver operating characteristic false discovery rate microarray doubly noncentral t distribution pentapeptide amino acid sequence similarity
28	Multiple hypothesis testing and multiple outlier identification methods Yin, Yaling 13 April 2010 (has links) Traditional multiple hypothesis testing procedures, such as that of Benjamini and Hochberg, fix an error rate and determine the corresponding rejection region. In 2002 Storey proposed a fixed rejection region procedure and showed numerically that it can gain more power than the fixed error rate procedure of Benjamini and Hochberg while controlling the same false discovery rate (FDR). In this thesis it is proved that when the number of alternatives is small compared to the total number of hypotheses, Storeys method can be less powerful than that of Benjamini and Hochberg. Moreover, the two procedures are compared by setting them to produce the same FDR. The difference in power between Storeys procedure and that of Benjamini and Hochberg is near zero when the distance between the null and alternative distributions is large, but Benjamini and Hochbergs procedure becomes more powerful as the distance decreases. It is shown that modifying the Benjamini and Hochberg procedure to incorporate an estimate of the proportion of true null hypotheses as proposed by Black gives a procedure with superior power.<p> Multiple hypothesis testing can also be applied to regression diagnostics. In this thesis, a Bayesian method is proposed to test multiple hypotheses, of which the i-th null and alternative hypotheses are that the i-th observation is not an outlier versus it is, for i=1,...,m. In the proposed Bayesian model, it is assumed that outliers have a mean shift, where the proportion of outliers and the mean shift respectively follow a Beta prior distribution and a normal prior distribution. It is proved in the thesis that for the proposed model, when there exists more than one outlier, the marginal distributions of the deletion residual of the i-th observation under both null and alternative hypotheses are doubly noncentral t distributions. The outlyingness of the i-th observation is measured by the marginal posterior probability that the i-th observation is an outlier given its deletion residual. An importance sampling method is proposed to calculate this probability. This method requires the computation of the density of the doubly noncentral F distribution and this is approximated using Patnaiks approximation. An algorithm is proposed in this thesis to examine the accuracy of Patnaiks approximation. The comparison of this algorithms output with Patnaiks approximation shows that the latter can save massive computation time without losing much accuracy.<p> The proposed Bayesian multiple outlier identification procedure is applied to some simulated data sets. Various simulation and prior parameters are used to study the sensitivity of the posteriors to the priors. The area under the ROC curves (AUC) is calculated for each combination of parameters. A factorial design analysis on AUC is carried out by choosing various simulation and prior parameters as factors. The resulting AUC values are high for various selected parameters, indicating that the proposed method can identify the majority of outliers within tolerable errors. The results of the factorial design show that the priors do not have much effect on the marginal posterior probability as long as the sample size is not too small.<p> In this thesis, the proposed Bayesian procedure is also applied to a real data set obtained by Kanduc et al. in 2008. The proteomes of thirty viruses examined by Kanduc et al. are found to share a high number of pentapeptide overlaps to the human proteome. In a linear regression analysis of the level of viral overlaps to the human proteome and the length of viral proteome, it is reported by Kanduc et al. that among the thirty viruses, human T-lymphotropic virus 1, Rubella virus, and hepatitis C virus, present relatively higher levels of overlaps with the human proteome than the predicted level of overlaps. The results obtained using the proposed procedure indicate that the four viruses with extremely large sizes (Human herpesvirus 4, Human herpesvirus 6, Variola virus, and Human herpesvirus 5) are more likely to be the outliers than the three reported viruses. The results with thefour extreme viruses deleted confirm the claim of Kanduc et al. mean shift noncentrality parameter area under ROC curve receiver operating characteristic false discovery rate microarray doubly noncentral t distribution pentapeptide amino acid sequence similarity
29	On the performance of hedge funds Dewaele, Benoît 28 May 2013 (has links) This thesis investigates the performance of hedge funds, funds of hedge funds and alternative Ucits together with the determinants of this performance by using new or well-suited econometric techniques. As such, it lies at the frontier of finance and financial econometrics and contributes to both fields. For the sake of clarity, we summarize the main contributions to each field separately. <p>The contribution of this thesis to the field of financial econometrics is the time-varying style analysis developed in the second chapter. This statistical tool combines the Sharpe analysis with a time-varying coefficient method; thereby, it is taking the best of both worlds. <p>Sharpe (1992) has developed the idea of “style analysis”, building on the conclusion that a regression taking into account the constraints faced by mutual funds should give a better picture of their holdings. To get an estimate of their holdings, he incorporates, in a standard regression, typical constraints related to the regulation of mutual funds, such as no short-selling and value preservation. He argues that this gives a more realistic picture of their investments and consequently better estimations of their future expected returns.<p>Unfortunately, in the style analysis, the weights are constrained to be constant. Even if, for funds of hedge funds the weights should also sum up to 1, given their dynamic nature, the constant weights seem more restrictive than for mutual funds. Hence, the econometric literature was lacking a method incorporating the constraints and the possibility for the weights to vary. Motivated by this gap, we develop a method that allows the weights to vary while being constrained to sum up to 1 by combining the Sharpe analysis with a time-varying coefficient model. As the style analysis has proven to be a valuable tool for mutual fund analysis, we believe our approach offers many potential fields of application both for funds of hedge funds and mutual funds.<p>The contributions of our thesis to the field of finance are numerous. <p>Firstly, we are the first to offer a comprehensive and exhaustive assessment of the world of FoHFs. Using both a bootstrap analysis and a method that allows dealing with multiple hypothesis tests straightforwardly, we show that after fees, the majority of FoHFs do not channel alpha from single-manager hedge funds and that only very few FoHFs deliver after-fee alpha per se, i.e. on top of the alpha of the hedge fund indices. We conclude that the added value of the vast majority of FoHFs should thus not be expected to come from the selection of the best HFs but from the risk management-monitoring skills and the easy access they provide to the HF universe.<p> <p> <p>Secondly, despite that the leverage is one of the key features of funds of hedge funds, there was a gap in the understanding of the impact it might have on the investor’s alpha. This was likely due to the quasi-absence of data about leverage and to the fact that literature was lacking a proper tool to implicitly estimate this leverage. <p>We fill this gap by proposing a theoretical model of fund of hedge fund leverage and alpha where the cost of borrowing is increasing with leverage. In the literature, this is the first model which integrates the rising cost of borrowing in the leverage decision of FoHFs. We use this model to determine the conditions under which the leverage has a negative or a positive impact on investor’s alpha and show that the manager has an incentive to take a leverage that hurts the investor’s alpha. Next, using estimates of the leverages of a sample of FoHFs obtained through the time-varying style analysis, we show that leverage has indeed a negative impact on alphas and appraisal ratios. We argue that this effect may be an explanation for the disappointing alphas delivered by funds of hedge funds and can be interpreted as a potential explanation for the “capacity constraints ” effect. To the best of our knowledge, we are the first to report and explain this negative relationship between alpha and leverage in the industry. <p>Thirdly, we show the interest of the time-varying coefficient model in hedge fund performance assessment and selection. Since the literature underlines that manager skills are varying with macro-economic conditions, the alpha should be dynamic. Unfortunately, using ordinary least-squares regressions forces the estimate of the alpha to be constant over the estimation period. The alpha of an OLS regression is thus static whereas the alpha generation process is by nature varying. On the other hand, we argue that the time-varying alpha captures this dynamic behaviour. <p>As the literature shows that abnormal-return persistence is essentially short-term, we claim that using the quasi-instantaneous detection ability of the time-varying model to determine the abnormal-return should lead to outperforming portfolios. Using a persistence analysis, we check this conjecture and show that contrary to top performers in terms of OLS alpha, the top performers in terms of past time-varying alpha generate superior and significant ex-post performance. Additionally, we contribute to the literature on the topic by showing that persistence exists and can be as long as 3 years. Finally, we use the time-varying analysis to obtain estimates of the expected returns of hedge funds and show that using those estimates in a mean-variance framework leads to better ex-post performance. Therefore, we conclude that in terms of hedge fund performance detection, the time-varying model is superior to the OLS analysis.<p>Lastly, we investigate the funds that have chosen to adopt the “Alternative UCITS” framework. Contrary to the previous frameworks that were designed for mutual fund managers, this new set of European Union directives can be suited to hedge fund-like strategies. We show that for Ucits funds there is some evidence, although weak, of the added value of offshore experience. On the other hand, we find no evidence of added value in the case of non-offshore experienced managers. Motivated to further refine our results, we separate Ucits with offshore experienced managers into two groups: those with equivalent offshore hedge funds (replicas) and those without (new funds). This time, Ucits with no offshore equivalents show low volatility and a strongly positive alpha. Ucits with offshore equivalents on the other hand bring no added value and, not surprisingly, bear no substantial differences in their risk profile with their paired funds offshore. Therefore, we conclude that offshore experience plays a significant role in creating positive alpha, as long as it translates into real innovations. If the fund is a pure replica, the additional costs brought by the Ucits structure represent a handicap that is hardly compensated. As “Alternative Ucits” have only been scarcely investigated, this paper represents a contribution to the better understanding of those funds.<p>In summary, this thesis improves the knowledge of the distribution, detection and determinants of the performance in the industry of hedge funds. It also shows that a specific field such as the hedge fund industry can still tell us more about the sources of its performance as long as we can use methodologies in adequacy with their behaviour, uses, constraints and habits. We believe that both our results and the methods we use pave the way for future research questions in this field, and are of the greatest interest for professionals of the industry as well.<p> / Doctorat en Sciences économiques et de gestion / info:eu-repo/semantics/nonPublished Economie Hedge funds Fonds spéculatifs Time-Varying Coefficients Leverage Style Analysis False Discovery Rate Alpha Performance Alternative Ucits Ucits Hedge Funds Funds of Hedge Funds
30	Détection des changements de points multiples et inférence du modèle autorégressif à seuil / Detection of abrupt changes and autoregressive models Elmi, Mohamed Abdillahi 30 March 2018 (has links) Cette thèse est composée de deux parties: une première partie traite le problème de changement de régime et une deuxième partie concerne le processusautorégressif à seuil dont les innovations ne sont pas indépendantes. Toutefois, ces deux domaines de la statistique et des probabilités se rejoignent dans la littérature et donc dans mon projet de recherche. Dans la première partie, nous étudions le problème de changements derégime. Il existe plusieurs méthodes pour la détection de ruptures mais les principales méthodes sont : la méthode de moindres carrés pénalisés (PLS)et la méthode de derivée filtrée (FD) introduit par Basseville et Nikirov. D’autres méthodes existent telles que la méthode Bayésienne de changementde points. Nous avons validé la nouvelle méthode de dérivée filtrée et taux de fausses découvertes (FDqV) sur des données réelles (des données du vent sur des éoliennes et des données du battement du coeur). Bien naturellement, nous avons donné une extension de la méthode FDqV sur le cas des variables aléatoires faiblement dépendantes.Dans la deuxième partie, nous étudions le modèle autorégressif à seuil (en anglais Threshold Autoregessive Model (TAR)). Le TAR est étudié dans la littérature par plusieurs auteurs tels que Tong(1983), Petrucelli(1984, 1986), Chan(1993). Les applications du modèle TAR sont nombreuses par exemple en économie, en biologie, l'environnement, etc. Jusqu'à présent, le modèle TAR étudié concerne le cas où les innovations sont indépendantes. Dans ce projet, nous avons étudié le cas où les innovations sont non corrélées. Nous avons établi les comportements asymptotiques des estimateurs du modèle. Ces résultats concernent la convergence presque sûre, la convergence en loi et la convergence uniforme des paramètres. / This thesis has two parts: the first part deals the change points problem and the second concerns the weak threshold autoregressive model (TAR); the errors are not correlated.In the first part, we treat the change point analysis. In the litterature, it exists two popular methods: The Penalized Least Square (PLS) and the Filtered Derivative introduced by Basseville end Nikirov.We give a new method of filtered derivative and false discovery rate (FDqV) on real data (the wind turbines and heartbeats series). Also, we studied an extension of FDqV method on weakly dependent random variables.In the second part, we spotlight the weak threshold autoregressive (TAR) model. The TAR model is studied by many authors such that Tong(1983), Petrucelli(1984, 1986). there exist many applications, for example in economics, biological and many others. The weak TAR model treated is the case where the innovations are not correlated. Dérivées filtrées Fausses alarmes Changement de points Taux de fausses découvertes Modèles autorégressif Série temporelle Filtered Derivative False alarms Change points False Discovery Rate Autoressive models Time series 510

Search results