Spelling suggestions: "subject:"quantile""
31 |
Modèles de régression linéaire pour variables explicatives fonctionnellesCrambes, Christophe 23 November 2006 (has links) (PDF)
L'analyse des données fonctionnelles constitue une branche de la statistique dont le développement s'est fortement intensifié ces dernières années. Dans cette thèse, on s'intéresse à des problèmes de régression fonctionnelle pour lesquels il s'agit d'expliquer les variations d'une variable d'intérêt réelle à partir d'une variable explicative fonctionnelle, c'est-à-dire à valeur dans un espace de dimension éventuellement infinie. On considère plus précisément des modèles de régression linéaire. Deux types d'estimation sont proposés: l'estimation de quantiles conditionnels et l'estimation de la moyenne conditionnelle (cette dernière étant considérée dans le cas où la variable explicative est non bruitée, puis lorsque celle-ci est soumise à des erreurs de mesure). Dans chaque cas, des estimateurs basés sur les fonctions splines sont proposés, solutions de problèmes de minimisation pénalisés, la pénalisation intervenant pour contourner le problème lié au fait que la variable explicative est à valeurs dans un espace de dimension infinie. Finalement, on s'intéresse aux aspects pratique de cette étude, au moyen de simulations, puis sur un jeu de données réelles concernant la prévision de pics de pollution à l'ozone à Toulouse.
|
32 |
Processus empiriques, estimation non paramétrique et données censurées.Viallon, Vivian 01 December 2006 (has links) (PDF)
La théorie des processus empiriques joue un rôle central en statistique, puisqu'elle concerne l'ensemble des résultats limites généraux se rapportant aux échantillons aléatoires. En particulier, des lois uniformes du logarithme ont permis d'aborder de manière systématique la convergence en norme sup des estimateurs à noyau. Dans cette thèse, nous obtenons premièrement des lois fonctionnelles uniformes du logarithme pour les incréments du processus des quantiles normé, qui permettent d'établir des propriétés nouvelles des estimateurs basés sur les k-plus proches voisins. Le même type de résultat est ensuite obtenu pour les incréments du processus empirique de Kaplan-Meier, conduisant naturellement à des lois du logarithme uniformes pour des estimateurs de la densité et du taux de mortalité en présence de censure à droite. Dans le cas de la régression multivariée, des lois analogues sont obtenues pour des estimateurs à noyau, notamment dans le cas censuré. Enfin, nous développons un estimateur non paramétrique de la régression sous l'hypothèse du modèle additif dans le cas de censure à droite, permettant de se défaire du fléau de la dimension. Cet estimateur repose essentiellement sur la méthode d'intégration marginale.
|
33 |
Oscilação interdecadal do Pacífico e seus impactos no regime de precipitação no Estado de São Paulo / Pacific interdecadal Oscillation and its impacts on São Paulo State rainfall regimeLuciana Figueiredo Prado 07 January 2011 (has links)
A importância do Estado de São Paulo (ESP) é notável no desenvolvimento do Brasil, seja no setor econômico ou energético, o que justifica o estudo do comportamento do clima nessa região. O conhecimento da variabilidade da precipitação é imprescindível na gestão de recursos hídricos e possui grande impacto na agricultura e geração de energia por meio de fontes hidrelétricas. Estudos anteriores apontaram efeitos não-lineares do El Niño-Oscilação Sul (ENOS) sobre a precipitação no ESP; entretanto, nenhum estudo específico acerca da influência da Oscilação interdecadal do Pacífico (ODP) nesta área foi ainda realizado, embora haja alguns impactos conhecidos na América do Sul. Deste modo, este trabalho estudou a relação entre anomalias de precipitação no ESP e a ODP, no período de 1901 a 2007, de forma a auxiliar as pesquisas na linha da previsão climática nessa região do Brasil. Na primeira etapa, foram descritos os regimes de precipitação tanto para a América do Sul como localmente, para o ESP, onde se destacaram fatores como a topografia e a influência do Oceano Atlântico. Posteriormente, foram calculados quantis anuais e mensais que permitiram classificar cada evento quanto ao total de precipitação. Regiões pluviometricamente homogêneas foram determinadas no ESP com base na climatologia e nos quantis de precipitação. Notou-se a relação construtiva entre eventos ENOS e as fases da ODP, com máximo durante o verão austral. Os sinais da ODP são percebidos em todo o ESP principalmente na primavera e no verão austrais. Uma análise complementar mostrou que as fases da Oscilação Multidecadal do Atlântico (AMO) também contribuem para a precipitação no ESP durante o verão e a primavera austrais no litoral, durante o verão no interior, e ao longo da primavera na região da Serra da Mantiqueira. Aparentemente, não há relação entre os eventos ENOS e a AMO. / São Paulo State (SPS) is remarkably important to the development of Brazil, economically or energetically, and this justifies climate studies on that region. Knowing rainfall variability is essential to water resources management and it has a great impact on agriculture an power production by hydroelectric power plants. Previous studies have detected non-linear effects of El Niño-Southern Oscillation (ENSO) on SPS rainfall however no specific work deals with PDO influence in this area besides some impacts on South America are known. Therefore this work has studied the relationship between rainfall anomalies in SPS and PDO from 1901 to 2007 to contribute to the climate forecasting improvement. First it was described the rainfall regime in South America, and locally in SPS where topography and the Atlantic Ocean influences were of special importance. Then annual and monthly quantiles were calculated to allow the classification of events according to rainfall totals. Rainfall homogeneous regions were established in SPS using climatology and quantiles. It was observed the constructive relationship between ENSO events and PDO phases, mainly on austral summer. PDO signals were noticed all over the SPS mostly on austral spring and summer. An additional analysis showed that Atlantic Multidecadal Oscillation (AMO) phases also contribute to SPS rainfall during austral summer and spring at the coast, only on summer at the country and during spring at the Mantiqueira Slopes. Apparently, there is no relation between ENSO events and AMO phases.
|
34 |
Intervalos de confiança para altos quantis oriundos de distribuições de caudas pesadas / Confidence intervals for high quantiles from heavy-tailed distributions.Michel Helcias Montoril 10 March 2009 (has links)
Este trabalho tem como objetivo calcular intervalos de confiança para altos quantis oriundos de distribuições de caudas pesadas. Para isso, utilizamos os métodos da aproximação pela distribuição normal, razão de verossimilhanças, {\\it data tilting} e gama generalizada. Obtivemos, através de simulações, que os intervalos calculados a partir do método da gama generalizada apresentam probabilidades de cobertura bem próximas do nível de confiança, com amplitudes médias menores do que os outros três métodos, para dados gerados da distribuição Weibull. Todavia, para dados gerados da distribuição Fréchet, o método da razão de verossimilhanças fornece os melhores intervalos. Aplicamos os métodos utilizados neste trabalho a um conjunto de dados reais, referentes aos pagamentos de indenizações, em reais, de seguros de incêndio, de um determinado grupo de seguradoras no Brasil, no ano de 2003 / In this work, confidence intervals for high quantiles from heavy-tailed distributions were computed. More specifically, four methods, namely, normal approximation method, likelihood ratio method, data tilting method and generalised gamma method are used. A simulation study with data generated from Weibull distribution has shown that the generalised gamma method has better coverage probabilities with the smallest average length intervals. However, from data generated from Fréchet distribution, the likelihood ratio method gives the better intervals. Moreover, the methods used in this work are applied on a real data set from 1758 Brazilian fire claims
|
35 |
Formal Methods for Probabilistic Energy ModelsDaum, Marcus 11 April 2019 (has links)
The energy consumption that arises from the utilisation of information processing systems adds a significant contribution to environmental pollution and has a big share of operation costs. This entails that we need to find ways to reduce the energy consumption of such systems. When trying to save energy it is important to ensure that the utility (e.g., user experience) of a system is not unnecessarily degraded, requiring a careful trade-off analysis between the consumed energy and the resulting utility. Therefore, research on energy efficiency has become a very active and important research topic that concerns many different scientific areas, and is as well of interest for industrial companies.
The concept of quantiles is already well-known in mathematical statistics, but its benefits for the formal quantitative analysis of probabilistic systems have been noticed only recently. For instance, with the help of quantiles it is possible to reason about the minimal energy that is required to obtain a desired system behaviour in a satisfactory manner, e.g., a required user experience will be achieved with a sufficient probability. Quantiles also allow the determination of the maximal utility that can be achieved with a reasonable probability while staying within a given energy budget. As those examples illustrate important measures that are of interest when analysing energy-aware systems, it is clear that it is beneficial to extend formal analysis-methods with possibilities for the calculation of quantiles.
In this monograph, we will see how we can take advantage of those quantiles as an instrument for analysing the trade-off between energy and utility in the field of probabilistic model checking. Therefore, we present algorithms for their computation over Markovian models. We will further investigate different techniques in order to improve the computational performance of implementations of those algorithms. The main feature that enables those improvements takes advantage of the specific characteristics of the linear programs that need to be solved for the computation of quantiles. Those improved algorithms have been implemented and integrated into the well-known probabilistic model checker PRISM. The performance of this implementation is then demonstrated by means of different protocols with an emphasis on the trade-off between the consumed energy and the resulting utility. Since the introduced methods are not restricted to the case of an energy-utility analysis only, the proposed framework can be used for analysing the interplay of cost and its resulting benefit in general.:1 Introduction
1.1 Related work
1.2 Contribution and outline
2 Preliminaries
3 Reward-bounded reachability properties and quantiles
3.1 Essentials
3.2 Dualities
3.3 Upper-reward bounded quantiles
3.3.1 Precomputation
3.3.2 Computation scheme
3.3.3 Qualitative quantiles
3.4 Lower-reward bounded quantiles
3.4.1 Precomputation
3.4.2 Computation scheme
3.5 Energy-utility quantiles
3.6 Quantiles under side conditions
3.6.1 Upper reward bounds
3.6.2 Lower reward bounds
3.6.2.1 Maximal reachability probabilities
3.6.2.2 Minimal reachability probabilities
3.7 Reachability quantiles and continuous time
3.7.1 Dualities
4 Expectation Quantiles
4.1 Computation scheme
4.2 Arbitrary models
4.2.1 Existential expectation quantiles
4.2.2 Universal expectation quantiles
5 Implementation
5.1 Computation optimisations
5.1.1 Back propagation
5.1.2 Reward window
5.1.3 Topological sorting of zero-reward sub-MDPs
5.1.4 Parallel computations
5.1.5 Multi-thresholds
5.1.6 Multi-state solution methods
5.1.7 Storage for integer sets
5.1.8 Elimination of zero-reward self-loops
5.2 Integration in Prism
5.2.1 Computation of reward-bounded reachability probabilities
5.2.2 Computation of quantiles in CTMCs
6 Analysed Protocols
6.1 Prism Benchmark Suite
6.1.1 Self-Stabilising Protocol
6.1.2 Leader-Election Protocol
6.1.3 Randomised Consensus Shared Coin Protocol
6.2 Energy-Aware Protocols
6.2.1 Energy-Aware Job-Scheduling Protocol
6.2.1.1 Energy-Aware Job-Scheduling Protocol with side conditions
6.2.1.2 Energy-Aware Job-Scheduling Protocol and expectation quantiles
6.2.1.3 Multiple shared resources
6.2.2 Energy-Aware Bonding Network Device (eBond)
6.2.3 HAECubie Demonstrator
6.2.3.1 Operational behaviour of the protocol
6.2.3.2 Formal analysis
7 Conclusion
7.1 Classification
7.2 Future prospects
Bibliography
List of Figures
List of Tables
|
36 |
Application Of Statistical Methods In Risk And ReliabilityHeard, Astrid 01 January 2005 (has links)
The dissertation considers construction of confidence intervals for a cumulative distribution function F(z) and its inverse at some fixed points z and u on the basis of an i.i.d. sample where the sample size is relatively small. The sample is modeled as having the flexible Generalized Gamma distribution with all three parameters being unknown. This approach can be viewed as an alternative to nonparametric techniques which do not specify distribution of X and lead to less efficient procedures. The confidence intervals are constructed by objective Bayesian methods and use the Jeffreys noninformative prior. Performance of the resulting confidence intervals is studied via Monte Carlo simulations and compared to the performance of nonparametric confidence intervals based on binomial proportion. In addition, techniques for change point detection are analyzed and further evaluated via Monte Carlo simulations. The effect of a change point on the interval estimators is studied both analytically and via Monte Carlo simulations.
|
37 |
A Statistical Analysis of Muscle Fiber AreaRohlén, Robin January 2014 (has links)
In the present study the cross sectional areas of individual muscle fibers were investigated with focus on statistical methodology. This thesis includes data from two studies; Resistance Study and Method Study. The Resistance Study analyzes the effect of exercise by comparing muscle fiber area before and after eight weeks of resistance training. Muscle biopsies from the vastus lateralis muscle were obtained from young male participants. The purpose of the Method Study was to examine the variation between right and left leg. Contrary to previous studies, this thesis focuses on individual data rather than on group-based data, and therefore takes a different approach than the previously published articles. This is proven to be successful since information is lost when analyzing group-wise, as the increase in small muscle fibers did not show when analyzing as a group. The results of the Resistance Study is similar to the results of the Method Study. Means and standard deviations have a wide spread both between subjects and between biopsies taken from the same subject. Inference on the 10th and 90th percentiles shows a positive pattern in the Resistance Study, in the sense that both the smallest and the largest muscle fibers have grown as a result of the resistance training. If muscle fiber area is used as a proxy for training effect, the conclusion is that many people seem to have responded well to the training.
|
38 |
Family, Work and Welfare States in Europe: Women's Juggling with Multiple Roles/Famille, Emploi et Etat-providence: la jonglerie des femmes avec leurs multiples rôlesO'Dorchai, Síle S. 24 January 2007 (has links)
The general focus of this thesis is on how the family, work and the welfare system are intertwined. A major determinant is the way responsibilities are shared by the state, the market and civil society in different welfare state regimes. An introductory chapter will therefore be dedicated to the development of the social dimension in the process of European integration. A first chapter will then go deeper into the comparative analysis of welfare state regimes, to comment on the provision of welfare in societies with a different mix of state, market and societal welfare roles and to assess the adequacy of existing typologies as reflections of today’s changed socio-economic, political and gender reality. Although they stand strong on their own, these first two chapters also contribute to contextualising the research subject of the remainder of the thesis: the study and comparison of the differential situation of women and men and of mothers and non-mothers on the labour markets of the EU-15 countries as well as of the role of public policies with respect to the employment penalties faced by women, particularly in the presence of young children. In our analysis, employment penalties are understood in three ways: (i) the difference in full-time equivalent employment rates between mothers and non-mothers, (ii) the wage penalty associated with motherhood, and (iii) the wage gap between part-time and full-time workers, considering men and women separately. Besides from a gender point of view, employment outcomes and public policies are thus assessed comparatively for mothers and non-mothers. Because women choose to take part in paid employment, fertility rates will depend on their possibilities to combine employment and motherhood. As a result, motherhood-induced employment penalties and the role of public policies to tackle them should be given priority attention, not just by scholars, but also by politicians and policy-makers.
|
39 |
Modely neuronových sítí pro podmíněné kvantily finančních výnosů a volatility / Neural network models for conditional quantiles of financial returns and volatilityHauzr, Marek January 2016 (has links)
This thesis investigates forecasting performance of Quantile Regression Neural Networks in forecasting multiperiod quantiles of realized volatility and quantiles of returns. It relies on model-free measures of realized variance and its components (realized variance, median realized variance, integrated variance, jump variation and positive and negative semivariances). The data used are S&P 500 futures and WTI Crude Oil futures contracts. Resulting models of returns and volatility have good absolute performance and relative performance in comparison to the linear quantile regression models. In the case of in- sample the models estimated by Quantile Regression Neural Networks provide better estimates than linear quantile regression models and in the case of out-of-sample they are equally good.
|
40 |
Développement de représentations et d'algorithmes efficaces pour l'apprentissage statistique sur des données génomiques / Learning from genomic data : efficient representations and algorithms.Le Morvan, Marine 03 July 2018 (has links)
Depuis le premier séquençage du génome humain au début des années 2000, de grandes initiatives se sont lancé le défi de construire la carte des variabilités génétiques inter-individuelles, ou bien encore celle des altérations de l'ADN tumoral. Ces projets ont posé les fondations nécessaires à l'émergence de la médecine de précision, dont le but est d'intégrer aux dossiers médicaux conventionnels les spécificités génétiques d'un individu, afin de mieux adapter les traitements et les stratégies de prévention. La traduction des variations et des altérations de l'ADN en prédictions phénotypiques constitue toutefois un problème difficile. Les séquenceurs ou puces à ADN mesurent plus de variables qu'il n'y a d'échantillons, posant ainsi des problèmes statistiques. Les données brutes sont aussi sujettes aux biais techniques et au bruit inhérent à ces technologies. Enfin, les vastes réseaux d'interactions à l'échelle des protéines obscurcissent l'impact des variations génétiques sur le comportement de la cellule, et incitent au développement de modèles prédictifs capables de capturer un certain degré de complexité.Cette thèse présente de nouvelles contributions méthodologiques pour répondre à ces défis.Tout d'abord, nous définissons une nouvelle représentation des profils de mutations tumorales, qui exploite leur position dans les réseaux d'interaction protéine-protéine. Pour certains cancers, cette représentation permet d'améliorer les prédictions de survie à partir des données de mutations, et de stratifier les cohortes de patients en sous-groupes informatifs. Nous présentons ensuite une nouvelle méthode d'apprentissage permettant de gérer conjointement la normalisation des données et l'estimation d'un modèle linéaire. Nos expériences montrent que cette méthode améliore les performances prédictives par rapport à une gestion séquentielle de la normalisation puis de l'estimation. Pour finir, nous accélérons l'estimation de modèles linéaires parcimonieux, prenant en compte des interactions deux à deux, grâce à un nouvel algorithme. L'accélération obtenue rend cette estimation possible et efficace sur des jeux de données comportant plusieurs centaines de milliers de variables originales, permettant ainsi d'étendre la portée de ces modèles aux données des études d'associations pangénomiques. / Since the first sequencing of the human genome in the early 2000s, large endeavours have set out to map the genetic variability among individuals, or DNA alterations in cancer cells. They have laid foundations for the emergence of precision medicine, which aims at integrating the genetic specificities of an individual with its conventional medical record to adapt treatment, or prevention strategies.Translating DNA variations and alterations into phenotypic predictions is however a difficult problem. DNA sequencers and microarrays measure more variables than there are samples, which poses statistical issues. The data is also subject to technical biases and noise inherent in these technologies. Finally, the vast and intricate networks of interactions among proteins obscure the impact of DNA variations on the cell behaviour, prompting the need for predictive models that are able to capture a certain degree of complexity. This thesis presents novel methodological contributions to address these challenges. First, we define a novel representation for tumour mutation profiles that exploits prior knowledge on protein-protein interaction networks. For certain cancers, this representation allows improving survival predictions from mutation data as well as stratifying patients into meaningful subgroups. Second, we present a new learning framework to jointly handle data normalisation with the estimation of a linear model. Our experiments show that it improves prediction performances compared to handling these tasks sequentially. Finally, we propose a new algorithm to scale up sparse linear models estimation with two-way interactions. The obtained speed-up makes this estimation possible and efficient for datasets with hundreds of thousands of main effects, thereby extending the scope of such models to the data from genome-wide association studies.
|
Page generated in 0.0674 seconds