• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 4
  • 1
  • Tagged with
  • 10
  • 10
  • 10
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Statistical inference for inequality measures based on semi-parametric estimators

Kpanzou, Tchilabalo Abozou 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics and especially in measuring the inequality in income or wealth within a population and between populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology, demography, epidemiology and information science. A large number of measures have been proposed to measure inequality. Examples include the Gini index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures are inherently dependent on the tails of the population (underlying distribution) and therefore their estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since the usual estimators are based on the empirical distribution function, they are usually nonrobust to such large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action therefore needs to be taken in such cases. The remedial action can be either a trimming of the extreme data or a modification of the (traditional) estimator to make it more robust to extreme observations. In this thesis we follow the second option, modifying the traditional empirical distribution function as estimator to make it more robust. Using results from extreme value theory, we develop more reliable distribution estimators in a semi-parametric setting. These new estimators of the distribution then form the basis for more robust estimators of the measures of inequality. These estimators are developed for the four most popular classes of measures, viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology, approximate confidence intervals were derived. Through the various simulation studies, the proposed estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination, confidence interval length and coverage probability. In these studies the semi-parametric methods show a clear improvement over the standard ones. The theoretical properties of the quintile share ratio have not been studied much. Consequently, we also derive its influence function as well as the limiting normal distribution of its nonparametric estimator. These results have not previously been published. In order to illustrate the methods developed, we apply them to a number of real life data sets. Using such data sets, we show how the methods can be used in practice for inference. In order to choose between the candidate parametric distributions, use is made of a measure of sample representativeness from the literature. These illustrations show that the proposed methods can be used to reach satisfactory conclusions in real life problems. / AFRIKAANSE OPSOMMING: Maatstawwe van ongelykheid, wat ook gebruik word as maatstawwe van konsentrasie of diversiteit, is baie populêr in ekonomie en veral vir die kwantifisering van ongelykheid in inkomste of welvaart binne ’n populasie en tussen populasies. Hulle het egter ook toepassings in baie ander dissiplines, byvoorbeeld ekologie, linguistiek, sosiologie, demografie, epidemiologie en inligtingskunde. Daar bestaan reeds verskeie maatstawwe vir die meet van ongelykheid. Voorbeelde sluit in die Gini indeks, die veralgemeende entropie maatstaf, die Atkinson maatstaf en die kwintiel aandeel verhouding. Maatstawwe van ongelykheid is inherent afhanklik van die sterte van die populasie (onderliggende verdeling) en beramers daarvoor is tipies dus sensitief vir data uit sodanige sterte (nierobuust). Inkomste verdelings het byvoorbeeld dikwels lang regtersterte, wat kan lei tot die voorkoms van groot waardes in steekproewe. Die tradisionele beramers is gebaseer op die empiriese verdelingsfunksie, en hulle is gewoonlik dus nierobuust teenoor sodanige groot waardes nie. Aangesien swaarstert verdelings dikwels voorkom in werklike data, moet regstellings gemaak word in sulke gevalle. Hierdie regstellings kan bestaan uit of die afknip van ekstreme data of die aanpassing van tradisionele beramers om hulle meer robuust te maak teen ekstreme waardes. In hierdie tesis word die tweede opsie gevolg deurdat die tradisionele empiriese verdelingsfunksie as beramer aangepas word om dit meer robuust te maak. Deur gebruik te maak van resultate van ekstreemwaardeteorie, word meer betroubare beramers vir verdelings ontwikkel in ’n semi-parametriese opset. Hierdie nuwe beramers van die verdeling vorm dan die basis vir meer robuuste beramers van maatstawwe van ongelykheid. Hierdie beramers word ontwikkel vir die vier mees populêre klasse van maatstawwe, naamlik Gini, veralgemeende entropie, Atkinson en kwintiel aandeel verhouding. Eienskappe van hierdie beramers word bestudeer, veral met behulp van simulasie studies. Benaderde vertrouensintervalle word ontwikkel deur gebruik te maak van limietverdelingsteorie en die skoenlus metodologie. Die voorgestelde beramers word vergelyk met tradisionele beramers deur middel van verskeie simulasie studies. Die vergelyking word gedoen in terme van gemiddelde kwadraat fout, relatiewe impak van kontaminasie, vertrouensinterval lengte en oordekkingswaarskynlikheid. In hierdie studies toon die semi-parametriese metodes ’n duidelike verbetering teenoor die tradisionele metodes. Die kwintiel aandeel verhouding se teoretiese eienskappe het nog nie veel aandag in die literatuur geniet nie. Gevolglik lei ons die invloedfunksie asook die asimptotiese verdeling van die nie-parametriese beramer daarvoor af. Ten einde die metodes wat ontwikkel is te illustreer, word dit toegepas op ’n aantal werklike datastelle. Hierdie toepassings toon hoe die metodes gebruik kan word vir inferensie in die praktyk. ’n Metode in die literatuur vir steekproefverteenwoordiging word voorgestel en gebruik om ’n keuse tussen die kandidaat parametriese verdelings te maak. Hierdie voorbeelde toon dat die voorgestelde metodes met vrug gebruik kan word om bevredigende gevolgtrekkings in die praktyk te maak.
2

Contribuição para a análise de teletráfego com dependência de longa duração. / Contribution to the analysis of network traffic with long-range dependence.

Lipas Augusto, Marcelo 07 April 2009 (has links)
A utilização de modelos de teletrafego que contemplem caractersticas tais como autossimilaridade e dependencia de longa duraçao tem se mostrado cada vez mais como sendo ponto-chave na correta caracterizaçao do teletrafego Local Area Network (LAN) e Wide Area Network (WAN) [1, 2]. Tal caracterizaçao e necessaria para o monitoramento e controle de teletrafego em redes convergentes [3]. Nesse contexto, a questão da estimaçao precisa do parâmetro de autossimilaridade, denominado de parâmetro de Hurst, torna-se essencial. Entretanto, estudos comprovam que, alem da dependência de longa duraçao, redes WAN podem, não raramente, apresentar caractersticas mistas de dependência de longa e de curta duraçao [4, 5]. Enquanto vasta literatura cientca, tanto teorica como pratica, tem abordado com anco a questão da acuracia de diversos estimadores para o parâmetro de Hurst [6, 7, 8, 9], pouca atenção tem sido dada a questão da estimação deste parâmetro na presenca de dependência de curta duração. O presente trabalho de pesquisa concentrou-se no estudo dos metodos de estimaçao do parametro de Hurst baseados no espectro wavelet, em particular atraves do metodo de Abry-Veitch [10] { baseado na transformada Discrete Wavelet Transform (DWT) { e atraves do espectro obtido atraves da transformada Discrete Wavelet Packet Transform (DWPT). Os resultados baseados no metodo de Abry-Veitch demonstram que, atraves de um ajuste apropriado dos par^ametros de estimaçao, tal metodo permite uma estimaçao robusta na presenca de componentes com dependencia de curta duraçao, mesmo em situaçoes de mudanca de regime de tal componente, caracterstica desejavel para a estimaçao em tempo real do parametro de Hurst. Entretanto, a dispersao consideravel apresentada, em alguns casos, pelas estimativas do metodo de Abry-Veitch, motivou o estudo da utilizaçao do espectro wavelet obtido via transformada DWPT para realizaçao da estimaçao do parametro de Hurst. Os resultados indicam que a utilizaçao de tal transformada gera um espectro wavelet tal que e possvel detectar a presenca ou não de componentes com dependencia de curta duraçao. Ao final, os resultados da pesquisa realizada são sumarizados e utilizados em uma proposta de mecanismo de estimaçao do parametro de Hurst em tempo real, na presenca simultanea de componentes de dependencia de longa e curta duracão. / The use of network trac models that hold self-similar and long-range dependence characteristics have shown to be a key element on the correct characterization of Local Area Network (LAN) and Wide Area Network (WAN) network trac [1, 2]. Such characterization is necessary to monitor and control the network trac in converged networks [3]. In this context, the accurate estimation of the selfsimilarity parameter, named Hurst parameter, is a major issue. However, studies show that, besides the long-range dependence, WAN network trac may, not uncommonly, present mixed long and short-range dependence characteristics [4, 5]. While great part of either theoretical or practical scientic literature has been focused on the issue of Hurst parameter estimator accuracy [6, 7, 8, 9], little attention has been given to the estimation of such parameter in the presence of short-range dependence. This research work has focused on the study of the Hurst parameter estimation methods based on the wavelet spectrum, specially through the Abry-Veitch method [10] { which is based on the Discrete Wavelet Transform (DWT) transform { and through the wavelet spectrum based on the Discrete Wavelet Packet Transform (DWPT) transform. The results based on the Abry-Veitch method show that, through a suitable adjustment of the estimation parameters, such method yields a robust estimation in the presence of short-range dependence components, even in changing conditions of such component, a desirable characteristic for the real-time estimation of the Hurst parameter. However, the signi cant dispersion presented, occasionally, by the Abry-Veitch method estimates motivated the research of the usage of the wavelet spectrum obtained via DWPT transform to estimate the Hurst parameter. The results show that the usage of such transform generates such a wavelet spectrum that it is possible to detect whether short-range dependence components are present, or not, in the analyzed series. At the end, the research results are summarized and used to propose a realtime Hurst parameter estimation mechanism, in the presence of simultaneous long- and short-range dependence components.
3

Contribuição para a análise de teletráfego com dependência de longa duração. / Contribution to the analysis of network traffic with long-range dependence.

Marcelo Lipas Augusto 07 April 2009 (has links)
A utilização de modelos de teletrafego que contemplem caractersticas tais como autossimilaridade e dependencia de longa duraçao tem se mostrado cada vez mais como sendo ponto-chave na correta caracterizaçao do teletrafego Local Area Network (LAN) e Wide Area Network (WAN) [1, 2]. Tal caracterizaçao e necessaria para o monitoramento e controle de teletrafego em redes convergentes [3]. Nesse contexto, a questão da estimaçao precisa do parâmetro de autossimilaridade, denominado de parâmetro de Hurst, torna-se essencial. Entretanto, estudos comprovam que, alem da dependência de longa duraçao, redes WAN podem, não raramente, apresentar caractersticas mistas de dependência de longa e de curta duraçao [4, 5]. Enquanto vasta literatura cientca, tanto teorica como pratica, tem abordado com anco a questão da acuracia de diversos estimadores para o parâmetro de Hurst [6, 7, 8, 9], pouca atenção tem sido dada a questão da estimação deste parâmetro na presenca de dependência de curta duração. O presente trabalho de pesquisa concentrou-se no estudo dos metodos de estimaçao do parametro de Hurst baseados no espectro wavelet, em particular atraves do metodo de Abry-Veitch [10] { baseado na transformada Discrete Wavelet Transform (DWT) { e atraves do espectro obtido atraves da transformada Discrete Wavelet Packet Transform (DWPT). Os resultados baseados no metodo de Abry-Veitch demonstram que, atraves de um ajuste apropriado dos par^ametros de estimaçao, tal metodo permite uma estimaçao robusta na presenca de componentes com dependencia de curta duraçao, mesmo em situaçoes de mudanca de regime de tal componente, caracterstica desejavel para a estimaçao em tempo real do parametro de Hurst. Entretanto, a dispersao consideravel apresentada, em alguns casos, pelas estimativas do metodo de Abry-Veitch, motivou o estudo da utilizaçao do espectro wavelet obtido via transformada DWPT para realizaçao da estimaçao do parametro de Hurst. Os resultados indicam que a utilizaçao de tal transformada gera um espectro wavelet tal que e possvel detectar a presenca ou não de componentes com dependencia de curta duraçao. Ao final, os resultados da pesquisa realizada são sumarizados e utilizados em uma proposta de mecanismo de estimaçao do parametro de Hurst em tempo real, na presenca simultanea de componentes de dependencia de longa e curta duracão. / The use of network trac models that hold self-similar and long-range dependence characteristics have shown to be a key element on the correct characterization of Local Area Network (LAN) and Wide Area Network (WAN) network trac [1, 2]. Such characterization is necessary to monitor and control the network trac in converged networks [3]. In this context, the accurate estimation of the selfsimilarity parameter, named Hurst parameter, is a major issue. However, studies show that, besides the long-range dependence, WAN network trac may, not uncommonly, present mixed long and short-range dependence characteristics [4, 5]. While great part of either theoretical or practical scientic literature has been focused on the issue of Hurst parameter estimator accuracy [6, 7, 8, 9], little attention has been given to the estimation of such parameter in the presence of short-range dependence. This research work has focused on the study of the Hurst parameter estimation methods based on the wavelet spectrum, specially through the Abry-Veitch method [10] { which is based on the Discrete Wavelet Transform (DWT) transform { and through the wavelet spectrum based on the Discrete Wavelet Packet Transform (DWPT) transform. The results based on the Abry-Veitch method show that, through a suitable adjustment of the estimation parameters, such method yields a robust estimation in the presence of short-range dependence components, even in changing conditions of such component, a desirable characteristic for the real-time estimation of the Hurst parameter. However, the signi cant dispersion presented, occasionally, by the Abry-Veitch method estimates motivated the research of the usage of the wavelet spectrum obtained via DWPT transform to estimate the Hurst parameter. The results show that the usage of such transform generates such a wavelet spectrum that it is possible to detect whether short-range dependence components are present, or not, in the analyzed series. At the end, the research results are summarized and used to propose a realtime Hurst parameter estimation mechanism, in the presence of simultaneous long- and short-range dependence components.
4

Une procédure de sélection automatique de la discrétisation optimale de la ligne du temps pour des méthodes longitudinales d’inférence causale

Ferreira Guerra, Steve 07 1900 (has links)
No description available.
5

Sur l’inférence statistique pour des processus spatiaux et spatio-temporels extrêmes / On statistical inference for spatial and spatio-temporal extreme processes

Abu-Awwad, Abdul-Fattah 20 June 2019 (has links)
Les catastrophes naturelles comme les canicules, les tempêtes ou les précipitations extrêmes, proviennent de processus physiques et ont, par nature, une dimension spatiale ou spatiotemporelle. Le développement de modèles et de méthodes d'inférences pour ces processus est un domaine de recherche très actif. Cette thèse traite de l'inférence statistique pour les événements extrêmes dans le cadre spatial et spatio-temporel. En particulier, nous nous intéressons à deux classes de processus stochastique: les processus spatiaux max-mélange et les processus max-stable spatio-temporels. Nous illustrons les résultats obtenus sur des données de précipitations dans l'Est de l'Australie et dans une région de la Floride aux Etats-Unis. Dans la partie spatiale, nous proposons deux tests sur le paramètre de mélange a d'un processus spatial max-mélange: le test statistique Za et le rapport de vraisemblance par paire LRa. Nous comparons les performances de ces tests sur simulations. Nous utilisons la vraisemblance par paire pour l'estimation. Dans l'ensemble, les performances des deux tests sont satisfaisantes. Toutefois, les tests rencontrent des difficultés lorsque le paramètre a se situe à la frontière de l'espace des paramètres, i.e., a ∈ {0,1}, dues à la présence de paramètre de “nuisance” qui ne sont pas identifiés sous l'hypothèse nulle. Nous appliquons ces tests dans le cadre d'une analyse d'excès au delà d'un grand seuil pour des données de précipitations dans l'Est de l'Australie. Nous proposons aussi une nouvelle procédure d'estimation pour ajuster des processus spatiaux max-mélanges lorsqu'on ne connait pas la classe de dépendance extrêmal. La nouveauté de cette procédure est qu'elle permet de faire de l'inférence sans spécifier au préalable la famille de distributions, laissant ainsi parle les données et guider l'estimation. En particulier, la procédure d'estimation utilise un ajustement par la méthode des moindres carrés sur l'expression du Fλ-madogramme d'un modèle max-mélange qui contient les paramètres d'intérêt. Nous montrons la convergence de l'estimateur du paramètre de mélange a. Une indication sur la normalité asymptotique est donnée numériquement. Une étude sur simulation montrent que la méthode proposée améliore les coefficients empiriques pour la classe de modèles max-mélange. Nous implémentons notre procédure d'estimations sur des données de maximas mensuels de précipitations en Australie dans un but exploratoire et confirmatoire. Dans la partie spatio-temporelle, nous proposons une méthode d'estimation semi-paramétrique pour les processus max-stables spatio-temporels en nous basant sur une expression explicite du F-madogramme spatio-temporel. Cette partie permet de faire le pont entre la géostatistique et la théorie des valeurs extrêmes. En particulier, pour des observations sur grille régulière, nous estimons le F-madogramme spatio-temporel par sa version empirique et nous appliquons une procédure basée sur les moments pour obtenir les estimations des paramètres d'intérêt. Nous illustrons les performances de cette procédure par une étude sur simulations. Ensuite, nous appliquons cette méthode pour quantifier le comportement extrêmal de maximum de données radar de précipitations dans l'Etat de Floride. Cette méthode peut être une alternative ou une première étape pour la vraisemblance composite. En effet, les estimations semi-paramétriques pourrait être utilisées comme point de départ pour les algorithmes d'optimisation utilisés dans la méthode de vraisemblance par paire, afin de réduire le temps de calcul mais aussi d'améliorer l'efficacité de la méthode / Natural hazards such as heat waves, extreme wind speeds, and heavy rainfall, arise due to physical processes and are spatial or spatio-temporal in extent. The development of models and inference methods for these processes is a very active area of research. This thesis deals with the statistical inference of extreme and rare events in both spatial and spatio-temporal settings. Specifically, our contributions are dedicated to two classes of stochastic processes: spatial max-mixture processes and space-time max-stable processes. The proposed methodologies are illustrated by applications to rainfall data collected from the East of Australia and from a region in the State of Florida, USA. In the spatial part, we consider hypothesis testing for the mixture parameter a of a spatial maxmixture model using two classical statistics: the Z-test statistic Za and the pairwise likelihood ratio statistic LRa. We compare their performance through an extensive simulation study. The pairwise likelihood is employed for estimation purposes. Overall, the performance of the two statistics is satisfactory. Nevertheless, hypothesis testing presents some difficulties when a lies on the boundary of the parameter space, i.e., a ∈ {0,1}, due to the presence of additional nuisance parameters which are not identified under the null hypotheses. We apply this testing framework in an analysis of exceedances over a large threshold of daily rainfall data from the East of Australia. We also propose a novel estimation procedure to fit spatial max-mixture processes with unknown extremal dependence class. The novelty of this procedure is to provide a way to make inference without specifying the distribution family prior to fitting the data. Hence, letting the data speak for themselves. In particular, the estimation procedure uses nonlinear least squares fit based on a closed form expression of the so-called Fλ-madogram of max-mixture models which contains the parameters of interest. We establish the consistency of the estimator of the mixing parameter a. An indication for asymptotic normality is given numerically. A simulation study shows that the proposed procedure improves empirical coefficients for the class of max-mixture models. In an analysis of monthly maxima of Australian daily rainfall data, we implement the proposed estimation procedure for diagnostic and confirmatory purposes. In the spatio-temporal part, based on a closed form expression of the spatio-temporal Fmadogram, we suggest a semi-parametric estimation methodology for space-time max-stable processes. This part provides a bridge between geostatistics and extreme value theory. In particular, for regular grid observations, the spatio-temporal F-madogram is estimated nonparametrically by its empirical version and a moment-based procedure is applied to obtain parameter estimates. The performance of the method is investigated through an extensive simulation study. Afterward, we apply this method to quantify the extremal behavior of radar daily rainfall maxima data from a region in the State of Florida. This approach could serve as an alternative or a prerequisite to pairwise likelihood estimation. Indeed, the semi-parametric estimates could be used as starting values for the optimization algorithm used to maximize the pairwise log-likelihood function in order to reduce the computational burden and also to improve the statistical efficiency
6

Regressão não-paramétrica com erros correlacionados via ondaletas. / Non-parametric regression with correlated errors using wavelets

Porto, Rogério de Faria 03 October 2008 (has links)
Nesta tese, são obtidas taxas de convergência a zero, do risco de estimação obtido com regressão não-paramétrica via ondaletas, quando há erros correlacionados. Quatro métodos de regressão não-paramétrica via ondaletas, com delineamento desigualmente espaçado são estudados na presença de erros correlacionados, oriundos de processos estocásticos. São apresentadas condições sobre os erros e adaptações aos procedimentos necessárias à obtenção de taxas de convergência quase minimax, para os estimadores. Sempre que possível são obtidas taxas de convergência para os estimadores no domínio da função, sob condições bastante gerais a respeito da função a ser estimada, do delineamento e da correlação dos erros. Mediante estudos de simulação, são avaliados os comportamentos de alguns métodos propostos quando aplicados a amostras finitas. Em geral sugere-se usar um dos procedimentos estudados, porém aplicando-se limiares por níveis. Como a estimação da variância dos coecientes de detalhes pode ser problemática em alguns casos, também se propõe um procedimento iterativo semi-paramétrico geral para métodos que utilizam ondaletas, na presença de erros em séries temporais. / In this thesis, rates of convergence to zero are obtained for the estimation risk, for non-parametric regression using wavelets, when the errors are correlated. Four non-parametric regression methods using wavelets, with un-equally spaced design are studied in the presence of correlated errors, that come from stochastic processes. Conditions on the errors and adaptations to the procedures are presented, so that the estimators achieve quasi-minimax rates of convergence. Whenever is possible, rates of convergence are obtained for the estimators in the domain of the function, under mild conditions on the function to be estimated, on the design and on the error correlation. Through simulation studies, the behavior of some of the proposed methods is evaluated, when used on finite samples. Generally, it is suggested to use one of the studied methods, however applying thresholds by level. Since the estimation of the detail coecients can be dicult in some cases, it is also proposed a general semi-parametric iterative procedure, for wavelet methods in the presence of time-series errors.
7

Regressão não-paramétrica com erros correlacionados via ondaletas. / Non-parametric regression with correlated errors using wavelets

Rogério de Faria Porto 03 October 2008 (has links)
Nesta tese, são obtidas taxas de convergência a zero, do risco de estimação obtido com regressão não-paramétrica via ondaletas, quando há erros correlacionados. Quatro métodos de regressão não-paramétrica via ondaletas, com delineamento desigualmente espaçado são estudados na presença de erros correlacionados, oriundos de processos estocásticos. São apresentadas condições sobre os erros e adaptações aos procedimentos necessárias à obtenção de taxas de convergência quase minimax, para os estimadores. Sempre que possível são obtidas taxas de convergência para os estimadores no domínio da função, sob condições bastante gerais a respeito da função a ser estimada, do delineamento e da correlação dos erros. Mediante estudos de simulação, são avaliados os comportamentos de alguns métodos propostos quando aplicados a amostras finitas. Em geral sugere-se usar um dos procedimentos estudados, porém aplicando-se limiares por níveis. Como a estimação da variância dos coecientes de detalhes pode ser problemática em alguns casos, também se propõe um procedimento iterativo semi-paramétrico geral para métodos que utilizam ondaletas, na presença de erros em séries temporais. / In this thesis, rates of convergence to zero are obtained for the estimation risk, for non-parametric regression using wavelets, when the errors are correlated. Four non-parametric regression methods using wavelets, with un-equally spaced design are studied in the presence of correlated errors, that come from stochastic processes. Conditions on the errors and adaptations to the procedures are presented, so that the estimators achieve quasi-minimax rates of convergence. Whenever is possible, rates of convergence are obtained for the estimators in the domain of the function, under mild conditions on the function to be estimated, on the design and on the error correlation. Through simulation studies, the behavior of some of the proposed methods is evaluated, when used on finite samples. Generally, it is suggested to use one of the studied methods, however applying thresholds by level. Since the estimation of the detail coecients can be dicult in some cases, it is also proposed a general semi-parametric iterative procedure, for wavelet methods in the presence of time-series errors.
8

Estimation of the mincerian wage model addressing its specification and different econometric issues

Bhatti, Sajjad Haider 03 December 2012 (has links) (PDF)
In the present doctoral thesis, we estimated Mincer's (1974) semi logarithmic wage function for the French and Pakistani labour force data. This model is considered as a standard tool in order to estimate the relationship between earnings/wages and different contributory factors. Despite of its vide and extensive use, simple estimation of the Mincerian model is biased because of different econometric problems. The main sources of bias noted in the literature are endogeneity of schooling, measurement error, and sample selectivity. We have tackled the endogeneity and measurement error biases via instrumental variables two stage least squares approach for which we have proposed two new instrumental variables. The first instrumental variable is defined as "the average years of schooling in the family of the concerned individual" and the second instrumental variable is defined as "the average years of schooling in the country, of particular age group, of particular gender, at the particular time when an individual had joined the labour force". Schooling is found to be endogenous for the both countries. Comparing two said instruments we have selected second instrument to be more appropriate. We have applied the Heckman (1979) two-step procedure to eliminate possible sample selection bias which found to be significantly positive for the both countries which means that in the both countries, people who decided not to participate in labour force as wage worker would have earned less than participants if they had decided to work as wage earner. We have estimated a specification that tackled endogeneity and sample selectivity problems together as we found in respect to present literature relative scarcity of such studies all over the globe in general and absence of such studies for France and Pakistan, in particular. Differences in coefficients proved worth of such specification. We have also estimated model semi-parametrically, but contrary to general norm in the context of the Mincerian model, our semi-parametric estimation contained non-parametric component from first-stage schooling equation instead of non-parametric component from selection equation. For both countries, we have found parametric model to be more appropriate. We found errors to be heteroscedastic for the data from both countries and then applied adaptive estimation to control adverse effects of heteroscedasticity. Comparing simple and adaptive estimations, we prefer adaptive specification of parametric model for both countries. Finally, we have applied quantile regression on the selected model from mean regression. Quantile regression exposed that different explanatory factors influence differently in different parts of the wage distribution of the two countries. For both Pakistan and France, it would be the first study that corrected both sample selectivity and endogeneity in single specification in quantile regression framework
9

Estimation of the mincerian wage model addressing its specification and different econometric issues / Estimation de la relation de salaires de Mincer : choix de specification et enjeux économétriques

Bhatti, Sajjad Haider 03 December 2012 (has links)
Dans cette thèse, notre cadre d’analyse repose sur l’estimation de la fonction de gain proposée par Mincer (1974). Le but est de reprendre la spécification de ce modèle en s'intéressant aux problèmes d’estimation liés. Le but est aussi une comparaison pour les marchés du travail français et pakistanais en utilisant une spécification plus robuste.[...] Toutefois, suivant une nombreuse littérature, la simple estimation du modèle de Mincer est biaisée, ceci en raison de différents problèmes. [...] Dans la présente thèse deux nouvelles variables instrumentales sont proposées dans une application de type IV2SLS. [...] D'après l'analyse menée dans cette thèse, la seconde variable instrumentale apparaît être la plus appropriée, cela puisqu’elle possède un faible effet direct sur la variable de réponse par rapport à la première variable instrumentale proposée. Par ailleurs, la définition de cette variable instrumentale est plus robuste que la première variable instrumentale. [...] Pour éliminer une autre source potentielle de biais, dans l'estimation du modèle de Mincer, i.e. le biais de sélection, la classique méthode à deux étapes de correction proposée par Heckman (1979) a été appliquée. Par cette méthode le biais de sélection a été trouvé positif et statistiquement significatif pour les deux pays. [...] Dans la littérature relative à l'estimation du modèle de Mincer, nous avons noté qu’il y a très peu d'études qui corrigent les deux sources de biais simultanément et aucune étude de cette nature n’existe pas pour la France ou le Pakistan.[...] Donc, en réponse, nous estimons ici une seule spécification corrigeant de manière simultanée le biais de sélection de l'échantillon et le biais d'endogénéité de l'éducation. Nous avons également noté, toujours d'après la littérature, que la robustesse des hypothèses du modèle linéaire utilisé pour estimer le modèle de Mincer a rarement été discutée et testée.[...] Nous avons donc testé formellement la validité de l'hypothèse d'homoscédasticité, cela en appliquant le test de White (1980).[...] Donc, afin d'éviter les effets de l'hétéroscédasticité des erreurs sur le processus d'estimation, nous avons réalisé une estimation adaptative du modèle de Mincer.[...]Basées sur la performance globale des modèles paramétrique et semi-paramétrique, nous avons constaté que, pour la France, les deux formes d'estimation apparaissent bien spécifiées. Toujours dans l'idée de maintenir la facilité d’estimation, le modèle paramétrique a été sélectionné afin d'être le plus approprié pour les données françaises. Pour l'analyse du Pakistan, nous avons conclu que le modèle semi-paramétrique produit des résultats en désaccord avec l’agrément général au Pakistan, mais aussi en rapport à la littérature internationale pour certaines des variables.[...] Donc, comme pour les données françaises, pour les données pakistanaises, nous avons aussi choisi le modèle paramétrique comme le plus robuste qu’afin d'estimer les impacts exercés par les différents facteurs explicatifs sur le processus de la détermination des salaires. Pour les deux pays, après avoir comparé les versions simples et adaptatives du modèle paramétrique et du modèle semi-paramétrique, nous avons trouvé que le modèle paramétrique dans la spécification adaptative est plus performant dans l’objectif d'estimer les impacts des différents facteurs contributifs au processus de détermination des salaires.Enfin, nous avons estimé le modèle de Mincer dans une forme paramétrique choisie de ces estimations, comme le plus approprié en rapport à la forme semi-paramétrique, et à partir de l'analyse de régression en moyenne, comme pour le modèle de régression par quantile.[...]La méthode de régression par quantile a révélé que la plupart des variables explicatives influencent les gains salariaux, ceci différemment suivant les différentes parties de la distribution des salaires, pour les deux marchés du travail considérés. / In the present doctoral thesis, we estimated Mincer’s (1974) semi logarithmic wage function for the French and Pakistani labour force data. This model is considered as a standard tool in order to estimate the relationship between earnings/wages and different contributory factors. Despite of its vide and extensive use, simple estimation of the Mincerian model is biased because of different econometric problems. The main sources of bias noted in the literature are endogeneity of schooling, measurement error, and sample selectivity. We have tackled the endogeneity and measurement error biases via instrumental variables two stage least squares approach for which we have proposed two new instrumental variables. The first instrumental variable is defined as "the average years of schooling in the family of the concerned individual" and the second instrumental variable is defined as "the average years of schooling in the country, of particular age group, of particular gender, at the particular time when an individual had joined the labour force". Schooling is found to be endogenous for the both countries. Comparing two said instruments we have selected second instrument to be more appropriate. We have applied the Heckman (1979) two-step procedure to eliminate possible sample selection bias which found to be significantly positive for the both countries which means that in the both countries, people who decided not to participate in labour force as wage worker would have earned less than participants if they had decided to work as wage earner. We have estimated a specification that tackled endogeneity and sample selectivity problems together as we found in respect to present literature relative scarcity of such studies all over the globe in general and absence of such studies for France and Pakistan, in particular. Differences in coefficients proved worth of such specification. We have also estimated model semi-parametrically, but contrary to general norm in the context of the Mincerian model, our semi-parametric estimation contained non-parametric component from first-stage schooling equation instead of non-parametric component from selection equation. For both countries, we have found parametric model to be more appropriate. We found errors to be heteroscedastic for the data from both countries and then applied adaptive estimation to control adverse effects of heteroscedasticity. Comparing simple and adaptive estimations, we prefer adaptive specification of parametric model for both countries. Finally, we have applied quantile regression on the selected model from mean regression. Quantile regression exposed that different explanatory factors influence differently in different parts of the wage distribution of the two countries. For both Pakistan and France, it would be the first study that corrected both sample selectivity and endogeneity in single specification in quantile regression framework
10

Contribution à la statistique spatiale et l'analyse de données fonctionnelles / Contribution to spatial statistics and functional data analysis

Ahmed, Mohamed Salem 12 December 2017 (has links)
Ce mémoire de thèse porte sur la statistique inférentielle des données spatiales et/ou fonctionnelles. En effet, nous nous sommes intéressés à l’estimation de paramètres inconnus de certains modèles à partir d’échantillons obtenus par un processus d’échantillonnage aléatoire ou non (stratifié), composés de variables indépendantes ou spatialement dépendantes.La spécificité des méthodes proposées réside dans le fait qu’elles tiennent compte de la nature de l’échantillon étudié (échantillon stratifié ou composé de données spatiales dépendantes).Tout d’abord, nous étudions des données à valeurs dans un espace de dimension infinie ou dites ”données fonctionnelles”. Dans un premier temps, nous étudions les modèles de choix binaires fonctionnels dans un contexte d’échantillonnage par stratification endogène (échantillonnage Cas-Témoin ou échantillonnage basé sur le choix). La spécificité de cette étude réside sur le fait que la méthode proposée prend en considération le schéma d’échantillonnage. Nous décrivons une fonction de vraisemblance conditionnelle sous l’échantillonnage considérée et une stratégie de réduction de dimension afin d’introduire une estimation du modèle par vraisemblance conditionnelle. Nous étudions les propriétés asymptotiques des estimateurs proposées ainsi que leurs applications à des données simulées et réelles. Nous nous sommes ensuite intéressés à un modèle linéaire fonctionnel spatial auto-régressif. La particularité du modèle réside dans la nature fonctionnelle de la variable explicative et la structure de la dépendance spatiale des variables de l’échantillon considéré. La procédure d’estimation que nous proposons consiste à réduire la dimension infinie de la variable explicative fonctionnelle et à maximiser une quasi-vraisemblance associée au modèle. Nous établissons la consistance, la normalité asymptotique et les performances numériques des estimateurs proposés.Dans la deuxième partie du mémoire, nous abordons des problèmes de régression et prédiction de variables dépendantes à valeurs réelles. Nous commençons par généraliser la méthode de k-plus proches voisins (k-nearest neighbors; k-NN) afin de prédire un processus spatial en des sites non-observés, en présence de co-variables spatiaux. La spécificité du prédicteur proposé est qu’il tient compte d’une hétérogénéité au niveau de la co-variable utilisée. Nous établissons la convergence presque complète avec vitesse du prédicteur et donnons des résultats numériques à l’aide de données simulées et environnementales.Nous généralisons ensuite le modèle probit partiellement linéaire pour données indépendantes à des données spatiales. Nous utilisons un processus spatial linéaire pour modéliser les perturbations du processus considéré, permettant ainsi plus de flexibilité et d’englober plusieurs types de dépendances spatiales. Nous proposons une approche d’estimation semi paramétrique basée sur une vraisemblance pondérée et la méthode des moments généralisées et en étudions les propriétés asymptotiques et performances numériques. Une étude sur la détection des facteurs de risque de cancer VADS (voies aéro-digestives supérieures)dans la région Nord de France à l’aide de modèles spatiaux à choix binaire termine notre contribution. / This thesis is about statistical inference for spatial and/or functional data. Indeed, weare interested in estimation of unknown parameters of some models from random or nonrandom(stratified) samples composed of independent or spatially dependent variables.The specificity of the proposed methods lies in the fact that they take into considerationthe considered sample nature (stratified or spatial sample).We begin by studying data valued in a space of infinite dimension or so-called ”functionaldata”. First, we study a functional binary choice model explored in a case-controlor choice-based sample design context. The specificity of this study is that the proposedmethod takes into account the sampling scheme. We describe a conditional likelihoodfunction under the sampling distribution and a reduction of dimension strategy to definea feasible conditional maximum likelihood estimator of the model. Asymptotic propertiesof the proposed estimates as well as their application to simulated and real data are given.Secondly, we explore a functional linear autoregressive spatial model whose particularityis on the functional nature of the explanatory variable and the structure of the spatialdependence. The estimation procedure consists of reducing the infinite dimension of thefunctional variable and maximizing a quasi-likelihood function. We establish the consistencyand asymptotic normality of the estimator. The usefulness of the methodology isillustrated via simulations and an application to some real data.In the second part of the thesis, we address some estimation and prediction problemsof real random spatial variables. We start by generalizing the k-nearest neighbors method,namely k-NN, to predict a spatial process at non-observed locations using some covariates.The specificity of the proposed k-NN predictor lies in the fact that it is flexible and allowsa number of heterogeneity in the covariate. We establish the almost complete convergencewith rates of the spatial predictor whose performance is ensured by an application oversimulated and environmental data. In addition, we generalize the partially linear probitmodel of independent data to the spatial case. We use a linear process for disturbancesallowing various spatial dependencies and propose a semiparametric estimation approachbased on weighted likelihood and generalized method of moments methods. We establishthe consistency and asymptotic distribution of the proposed estimators and investigate thefinite sample performance of the estimators on simulated data. We end by an applicationof spatial binary choice models to identify UADT (Upper aerodigestive tract) cancer riskfactors in the north region of France which displays the highest rates of such cancerincidence and mortality of the country.

Page generated in 0.1374 seconds