• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 106
  • 39
  • 23
  • 19
  • 7
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 217
  • 217
  • 34
  • 29
  • 27
  • 25
  • 24
  • 22
  • 22
  • 22
  • 22
  • 22
  • 21
  • 21
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Statistical Methods for Variability Management in High-Performance Computing

Xu, Li 15 July 2021 (has links)
High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs. This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model. Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data. Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data. Chapter 5 contains some general conclusions and areas for future work. / Doctor of Philosophy / This dissertation focuses on three projects that are all related to statistical methods in performance variability management in high-performance computing (HPC). HPC systems are computer systems that create high performance by aggregating a large number of computing units. The performance of HPC is measured by the throughput of a benchmark called the IOZone Filesystem Benchmark. The performance variability is the variation among throughputs when the system configuration is fixed. Variability management involves studying the relationship between performance variability and the system configuration. In Chapter 2, we use several existing prediction models to predict the standard deviation of throughputs given different system configurations and compare the accuracy of predictions. We also conduct HPC system optimization using the chosen prediction model as the objective function. In Chapter 3, we use the mixture model to determine the number of modes in the distribution of throughput under different system configurations. In addition, we develop a model to determine the number of additional runs for future benchmark experiments. In Chapter 4, we develop a statistical model that can predict the throughout distributions given the system configurations. We also compare the prediction of summary statistics of the throughput distributions with existing prediction models.
92

The asymmetry of the New Keynesian Phillips Curve in the euro-area

Chortareas, G., Magkonis, Georgios, Panagiotidis, T. January 2012 (has links)
No / Using a two-stage quantile regression framework, we uncover significant asymmetries across quantiles for all coefficients in an otherwise standard New Keynesian Phillips Curve (NKPC) for the euro area. A pure NKPC specification accurately captures inflation dynamics at high inflation quantiles.
93

A Comparison of Statistical Methods to Generate Short-Term Probabilistic Forecasts for Wind Power Production Purposes in Iceland / En jämförelse av statistiska metoder för attgenerera kortsiktiga probabilistiska prognoser för vindkraftsproduktion på Island

Jóhannsson, Arnór Tumi January 2022 (has links)
Accurate forecasts of wind speed and power production are of great value for wind power producers. In Southwest Iceland, wind power installations are being planned by various entities. This study aims to create optimal wind speed and wind power production forecasts for wind power production in Southwest Iceland by applying statistical post-processing methods to a deterministic HARMONIE-AROME forecast at a single point in space. Three such methods were implemented for a 22 month-long set of forecast-observation samples in 1h resolution: Temporal Smoothing (TS), Observational Distributions on Discrete Intervals (ODDI - a relatively simple classification algorithm) and Quantile Regression Forest (QRF - a relatively complicated Machine Learning Algorithm). Wind power forecasts were derived directly from forecasts of wind speed using an idealized power curve. Four different metrics were given equal weight in the evaluation of the methods: Root Mean Square Error (RMSE), Miss Rate of the 95-percent forecast interval (MR95), Mean Median Forecast Interval Width (MMFIW - a metric to measure the forecast sharpness) and Continuous Ranked Probability Score (CRPS). Of the three methods, TS performed inadequately while ODDI and QRF performed significantly better, and similarly to each other. Both ODDI and QRF predict wind speed and power production slightly more accurately than deterministic AROME in terms of their Root Mean Square Error. In addition to an overall evaluation of all three methods, ODDI and QRF were evaluated conditionally. The results indicate that QRF performs significantly better  than ODDI at forecasting wind speed and wind power at wind speeds above 13 m/s. Else, no strong discrepancies were found between their conditional performance. The results of this study are limited by a relatively scarce data set and correspondingly short time series. The results indicate that applying statistical post-processing methods of varying complexity to deterministic wind speed forecasts is a viable approach to gaining a probabilistic insight into the wind power potential at a given location.
94

Quantile regression in risk calibration

Chao, Shih-Kang 05 June 2015 (has links)
Die Quantilsregression untersucht die Quantilfunktion QY |X (τ ), sodass ∀τ ∈ (0, 1), FY |X [QY |X (τ )] = τ erfu ̈llt ist, wobei FY |X die bedingte Verteilungsfunktion von Y gegeben X ist. Die Quantilsregression ermo ̈glicht eine genauere Betrachtung der bedingten Verteilung u ̈ber die bedingten Momente hinaus. Diese Technik ist in vielerlei Hinsicht nu ̈tzlich: beispielsweise fu ̈r das Risikomaß Value-at-Risk (VaR), welches nach dem Basler Akkord (2011) von allen Banken angegeben werden muss, fu ̈r ”Quantil treatment-effects” und die ”bedingte stochastische Dominanz (CSD)”, welches wirtschaftliche Konzepte zur Messung der Effektivit ̈at einer Regierungspoli- tik oder einer medizinischen Behandlung sind. Die Entwicklung eines Verfahrens zur Quantilsregression stellt jedoch eine gro ̈ßere Herausforderung dar, als die Regression zur Mitte. Allgemeine Regressionsprobleme und M-Scha ̈tzer erfordern einen versierten Umgang und es muss sich mit nicht- glatten Verlustfunktionen besch ̈aftigt werden. Kapitel 2 behandelt den Einsatz der Quantilsregression im empirischen Risikomanagement w ̈ahrend einer Finanzkrise. Kapitel 3 und 4 befassen sich mit dem Problem der h ̈oheren Dimensionalit ̈at und nichtparametrischen Techniken der Quantilsregression. / Quantile regression studies the conditional quantile function QY|X(τ) on X at level τ which satisfies FY |X QY |X (τ ) = τ , where FY |X is the conditional CDF of Y given X, ∀τ ∈ (0,1). Quantile regression allows for a closer inspection of the conditional distribution beyond the conditional moments. This technique is par- ticularly useful in, for example, the Value-at-Risk (VaR) which the Basel accords (2011) require all banks to report, or the ”quantile treatment effect” and ”condi- tional stochastic dominance (CSD)” which are economic concepts in measuring the effectiveness of a government policy or a medical treatment. Given its value of applicability, to develop the technique of quantile regression is, however, more challenging than mean regression. It is necessary to be adept with general regression problems and M-estimators; additionally one needs to deal with non-smooth loss functions. In this dissertation, chapter 2 is devoted to empirical risk management during financial crises using quantile regression. Chapter 3 and 4 address the issue of high-dimensionality and the nonparametric technique of quantile regression.
95

Modelling Conditional Quantiles of CEE Stock Market Returns / Modelling Conditional Quantiles of CEE Stock Market Returns

Tóth, Daniel January 2015 (has links)
Correctly specified models to forecast returns of indices are important for in- vestors to minimize risk on financial markets. This thesis focuses on conditional Value at Risk modeling, employing flexible quantile regression framework and hence avoiding the assumption on the return distribution. We apply semi- parametric linear quantile regression (LQR) models with realized variance and also models with positive and negative semivariance which allows for direct modelling of the quantiles. Four European stock price indices are taken into account: Czech PX, Hungarian BUX, German DAX and London FTSE 100. The objective is to investigate how the use of realized variance influence the VaR accuracy and the correlation between the Central & Eastern and Western European indices. The main contribution is application of the LQR models for modelling of conditional quantiles and comparison of the correlation between European indices with use of the realized measures. Our results show that linear quantile regression models on one-step-ahead forecast provide better fit and more accurate modelling than classical VaR model with assumption of nor- mally distributed returns. Therefore LQR models with realized variance can be used as accurate tool for investors. Moreover we show that diversification benefits are...
96

O diferencial de notas entre as escolas públicas e privadas no Brasil: uma nova abordagem quantílica / The test scores differences between public and private schools in Brazil: a new quantile approach

Moraes, André Guerra Esteves de 14 June 2012 (has links)
Este trabalho busca trazer robustez aos resultados observados em estudos comparativos entre escolas públicas e privadas do Brasil, que indicam uma maior capacidade da rede particular de ensino em gerar qualidade educacional. Para isso, uma abordagem quantílica, baseada na seleção em observadas, foi realizada. Vale ressaltar que, ao contrário de outras abordagens, a realizada nesta dissertação tem inferência assintótica. A base de dados utilizada foi a do SAEB de 2005, para as provas de matemática de oitava série. Novamente foi evidenciada uma superioridade das escolas privadas, mesmo controlando para diversas covariadas de alunos, professores e escolas. Este fato fortalece a possibilidade de implantação de políticas de cupons para escolas particulares, apesar de haver a necessidade de estudos adicionais sobre o assunto. Em relação às covariadas que reduziriam a distância entre as distribuições de notas de alunos de escolas públicas e privadas, constatouse que fatores determinantes do grupo de alunos na escola e na sala (peer group effects) seriam os mais importantes. Isso corrobora com resultados de outros trabalhos que evidenciam a importância desses fatores para explicar a maior efetividade das escolas privadas em relação às escolas públicas. / This paper aims at bringing strength to the results observed in other studies that point out a larger ability of the private school network to generate quality education in Brazil. To achieve that result, this study applies a quantile approach based on the selection on observable variables. Note that unlike other approaches , the one applied in this dissertation has asymptotic inference. The data base used in this study was that of SAEB 2005 for math tests in the 8th grade. As in other studies, here again the superiority of the private schools was made evident, even though various students\', teachers\' and schools\' covariates are controlled. This result strengthens the possibility of a policy of quotas for private schools, although additional studies on the subject are necessary. In relation to the variables that would reduce the distance between the grades distributions of students on public and private schools, peer group effects were observed to be the more important ones. These results are similar to the ones observed in other studies that point out the importance of the peer group effects to explain the higher effectiveness of the private schools in comparison to the public schools.
97

Análise da política de dividendos: uma aplicação de regressão quantílica

Ströher, Jéferson Rodrigo 31 March 2015 (has links)
Submitted by Maicon Juliano Schmidt (maicons) on 2015-06-15T13:06:43Z No. of bitstreams: 1 Jéferson Rodrigo Ströher.pdf: 859725 bytes, checksum: e762d90a4c59cf0d13e4fcf3b22f8df3 (MD5) / Made available in DSpace on 2015-06-15T13:06:43Z (GMT). No. of bitstreams: 1 Jéferson Rodrigo Ströher.pdf: 859725 bytes, checksum: e762d90a4c59cf0d13e4fcf3b22f8df3 (MD5) Previous issue date: 2015-03-31 / Nenhuma / A política de dividendos é importante por envolver a tomada de decisão de distribuir ou não volumes de recursos financeiros através de dividendos ou de juros sobre capital próprio em percentuais diferenciados e formas de tributação diferenciadas no caso Brasileiro. O objetivo deste trabalho foi identificar quais os fatores impactam o índice Payout, como tamanho, liquidez, rentabilidade, endividamento, investimento, lucro, receita e concentração. Para analisar estes aspectos, utilizou-se a técnica de regressão quantílica, com dados da base da Economática e empresas representadas na BM&FBovespa no período de 2009 a 2013, compreendendo 3.073 observações. As estimativas permitem concluir que: a) há uma relação positiva entre tamanho da empresa e a dependente em todos os quantis; b) A variável liquidez foi relacionada positivamente com a variável payout nos quantis 0.5 e 0.75; c) a rentabilidade apresentou relação positiva a partir do quantil 0.5; d) uma relação negativa do endividamento a partir do quantil 0.5; e) uma relação negativa nos quartis 0.75 e 0.9 do investimento; f) O lucro líquido apresentou uma relação negativa no quartil 0.5; g) A receita apresentou uma relação positiva na mediana e no quartil 0.9; h) A concentração não apresentou significância estatística a 1%, no entanto verifica-se uma relação positiva no 0.75 e 0.9 e negativa nos outros quartis; i) Duas dummy’s sendo uma financeira que apresentou relação positiva no quartil 0.1 a 0.5 e a dummy de empresas que distribuíram proventos mesmo com prejuízo que apresentou uma associação negativa e cada vez mais forte. Todas as relações descritas mostram como a variável payout variou de acordo com modificações das variáveis descritas acima. / The dividend policy is importante because it covers the decision-making to distribute or not volumes of financial resources through dividendo or of interest over own capital in differentiate percentuals and tributation differentiate options in the Brazilian case. The objective of this work was to identify which factors impact the payout index, as size, liquidity, rentability, debt, investment, profit, revenue and concentration. To analize these aspects, the quantile regression technique was used, with data from the Economatica base and enterprises represented at BM&F Bovespa in the period from 2009 to 2013, comprehending 3.073 observations. The estimatives allow to conclude that: a) there is a positive ralation between enterprise size and the dependent variable in all the quantiles; b) the liquidity variable had a positive relation with the payout variable in the 0.5 and 0.75 quantile;c)the renatibility presented a positive relation fom the quantile 0,5 on;d)negative relation from the 0.5 quantile on; e) a negative relation in the quantile 0.75 and 0.9 with investment;f)the net profit presented a negative relation in the quantile 0.5;g)the revenue presented a positive relation in the median and 0.9 quantile;h)the concentration didn’t presented statistic significance at 1%, but there was verifyed a positive relation in the 0.75 and 0.9 quantile and negative in the others;i)Two dummys, one finantial that presented a positive relation in the quantile 0.1 and 0.5 and the dummy of the enterprises that distribute proceeds even with losses had a negative and stronger association with the variable payout. All the relations described show how the variable payout had varyed according to the modifications of the described variables above.
98

Há espaços para melhora no setor leiteiro? Uma análise de fronteira estocástica de produção e regressão quantílica utilizando dados do Censo Agropecuário 2006 (IBGE) / Is there room for improvement in the dairy sector? A stochastic production frontier and quantile regression analysis using data from the 2006 agricultural census (IBGE)

Brito, Ricardo Alves de 25 August 2016 (has links)
Ao longo dos últimos anos tem se observado no mundo uma expansão do setor leiteiro. Parte dessa expansão se deve a novas tecnologias que foram adotadas nas últimas décadas, mas também ocorreu por causa da queda, ou da anulação de barreiras comerciais. Contudo, notou-se também uma queda no número de fazendas leiteiras. Sendo o leite uma commodity os preços seguem as oscilações de mercado - oferta e demanda - e nenhum dos agentes possui poder para influenciar nos preços de compra e venda dessa mercadoria. Como os boletins do CEPEA mostram, os preços no ano passado têm-se mantido abaixo da média histórica, referente à última década, mas os termos de troca com relação a quantidade de litros de leite para se comprar insumos e defensivos se mantêm em patamares estáveis com tendência de alta. Tendo em vista esse problema, surge a necessidade de buscar compreender melhor como funciona o sistema de produção do setor leiteiro. Este trabalho satisfatoriamente conseguiu detectar através das fronteiras estocásticas de produção simples - leite como único produto de saída - e multi-output - leite e outros produtos animais existentes nas fazendas - além da regressão quantílica para análise de quantis variados da produção de leite, quais os insumos utilizados pelos produtores que oferecem melhores retornos para sua produção bem como analisar fatores de eficiência (BATTESE, COELLI; 1995; CHIDMI; SOLÍS; CABRERA, 2011). Os resultados apresentados apontam para a necessidade de se levar em consideração a inter-relação entre os insumos considerados - função de produção translog - e identificaram os insumos referentes ao capital - quantidade de vacas ordenhadas e gastos com máquinas e equipamentos - e ao trabalho - gastos com salários - como principais insumos da atividade pecuária. Os gastos com medicamentos animais, com energia elétrica e a área disponível para a atividade pecuária se mostraram contraproducentes indicando mau uso ou uso excessivo desses fatores, além de ressaltar a importância do capital na pecuária. Em geral, para quase todos os modelos testados, a produção leiteira apresentou retornos constantes à escala e nível de eficiência em torno de 88% em média para as fronteiras estocásticas e 90% para as estimativas feitas com regressão quantílica. Entre os fatores de eficiência identificados estão a capacidade de armazenamento de silos e tanques de refrigeração para o leite e a margem bruta líquida obtida com a atividade. Os fatores de ineficiência identificados são a prática de queimadas e o percentual de mulheres na administração das unidades produtivas. Com relação aos variados modelos estimados percebeu-se, em suma, a necessidade de se intensificar a produção pecuária e de melhorar a infraestrutura das fazendas. / Over the past few years it has been observed in the world an expansion of the dairy industry. Part of this expansion is due to new technologies that have been adopted in recent decades, but also because of the fall, or the annulment of trade barriers. However, it has also been noted a drop in the number of dairy farms. Being a commodity, milk prices follow the market oscillations - supply and demand - and none of the agents has enough power to influence buying and selling prices of this commodity. As the CEPEA bulletins show, prices last year have remained below the historical average for the last decade, but the terms of trade regarding the amount of liters of milk to buy inputs and pesticides at levels remain stable with uptrend. In view of this problem, there is the need to get a better understanding of how the dairy sector production system works. This work satisfactorily managed to detect, through the single-output stochastic production frontier method - value of milk production as output - and multi-output - value of milk and other existing animal products at the farms - besides quantile regression analysis for multiple production quantiles, which inputs used by farmers offer the best outcome for their production as well as analyzing efficiency factors (BATTESE; COELLI, 1995; CHIDMI; SOLÍS; CABRERA, 2011). The estimated results pointed to the need of considering the interrelation of considered inputs - translog production function - and identified the capital related inputs - quantity of milked cows and expenditure on machinery and equipment - and work related inputs - expenditure on wages - as main production inputs. Expenditure on animal drugs and on electricity and the area available for livestock activity proved counterproductive indicating misuse or overuse of these factors, in addition to emphasizing the importance of capital in livestock. In general, for most of the tested models, dairy production showed constant returns to scale and an average efficiency level of 88% for stochastic frontier models and 90% for estimates done using quantile regression. Among the identified efficiency factors are the storage capacity of silos and cooling tanks for milk and the net gross margin with activity. The identified inefficiency factors are the practice of burning and the percentage of women in the management of production units. With regard to various models estimated it was realized, in short, the need to intensify livestock production and to improve the infrastructure of the farms.
99

Regulamentação dos planos de saúde e risco moral : aplicação da regressão quantílica para dados de contagem

Godoy, Márcia Regina January 2008 (has links)
O setor de saúde suplementar brasileiro operou desde os anos de 1940 sem regulação. Em 1998, o governo estabeleceu a regulação deste setor. Na regulamentação das atividades foram estabelecidas a ilimitação do número de consultas médicas, proibição de seleção de risco, entre outras medidas. O objetivo deste trabalho é investigar se a regulação resultou em aumento do número de consultas médicas por parte dos subscritores de planos de saúde, ou seja se ocorreu aumento do risco moral ex-post. Além disto, analisar alterações nos determinantes da demanda por posse de plano de saúde antes e após a regulação visando encontrar indícios de seleção adversa. Para isto, foram utilizados quatro métodos econométricos: regressão de Poisson, regressão binomial negativa e regressão quantílica de dados de contagem e um modelo Probit. O estimador de diferenças-em-diferenças foi utilizado para estimar o impacto da regulação sobre o número de consultas médicas. O modelo de regressão Probit foi utilizado para analisar os determinantes da demanda por posse de plano de saúde. Os dados utilizados provêm da Pesquisa Nacional de Amostra de Domicílios de 1998 (antes da regulação) e 2003 (depois da regulamentação). Os dados foram divididos por sexo e também pelo perfil epidemiológico, sendo selecionados os dados daqueles indivíduos que declararam ser portadores de doença renal crônica. Os resultados dos modelos mostraram que após a regulamentação ocorreu um aumento geral do número de consultas. Contudo, o sinal da principal variável de interesse, a dummy associada ao efeito da regulamentação sobre o número de consultas médicas dos subscritores de planos de saúde foi negativo e estatisticamente significativo - tanto no caso dos homens como no das mulheres - , nos três modelos e nas duas amostras. Isto indica que após a regulamentação ocorreu uma redução do número de consultas médicas dos possuidores de planos de saúde em relação àqueles que não possuíam plano de saúde. O uso da regressão quantílica possibilitou mostrar que o número de doenças crônicas e a posse de um plano de saúde são os fatores que mais afetam o número de consultas. Permitiu também mostrar que os efeitos dos regressores são diferentes entre os sexos e que não são uniformes ao longo dos quantis. Os resultados dos modelos para dados de contagem mostraram que, mesmo quando se controlam as características epidemiológicas, existe risco moral, antes e após a regulamentação. Os resultados do modelo Probit sugerem a existência de seleção adversa após a regulamentação, pois mostram que os indivíduos com maior número de morbidades têm maior probabilidade de adquirir um plano de saúde. Em suma, os resultados mostraram que após a regulamentação ocorreram dois importantes problemas no mercado de saúde suplementar: seleção adversa e risco moral. A conjunção destes dois problemas pode comprometer a sustentabilidade do setor de saúde suplementar brasileiro. / The Brazilian private health insurance sector operated since 1940’s without regulation. In 1998, the government established the regulation of this sector. The reform improved the health insurance coverage level, stating no limit to physician visits and forbiddance of the cream skimming, among others measures. The objective of this thesis is to investigate if the regulation resulted in an increase of physician visits from consumers of health insurance, that is to say, if there has been an increase of moral risk ex-post. Besides, to investigate alterations in the determinants of demand for the health insurance - before and after the regulation - seeking to find evidence or clues of adverse selection. Four econometric methods have been used for this: Poisson Regression, Negative Binominal Regression and Quantile Regression for counts and Probit Regression. The estimator of difference-in-difference was used to estimate the impact of regulation on the amount of physician visits. The Probit model regression was used to analyze the determinants of the demand for health insurance. The data used come from the 1998 Brazilian Household Survey (Pesquisa Nacional de Amostra de Domicílios-PNAD) (before the regulation) and 2003 (after the regulation). The data was divided by gender and also by the epidemiologic characteristics, selecting the data of those individuals who declared being bearers of chronic renal disease. The results of the models showed that, after the regulation, there was a general increase in the amount of consultations. However, the sign of the main variable of interest (year*regulation), the dummy associated to the effect of the regulation on physician visits of the consumers of health insurance, was negative and statistically significant – both in men and in women – in the three models and in both samples. These results suggest that after the regulation there was a reduction in the amount of physician visits of the consumers of health insurance in relation to those who did not hold a health insurance plan. The results of Probit Regression showed that after regulation there is adverse selection, since the number of chronic diseases variable after regulation was positive and statistically significant. The use of quantile regression for counts made possible showing that the number of chronic diseases and the possession of a health insurance plan are the factors which mostly affect the amount of consultations. It also allowed showing that the effects of regressors are different between the genders and also that are different in different parts of the outcome distribution. The results for the models for count data showed that, even when controlling the epidemiologic characteristics, there is a moral hazard, before and after the regulation, since individuals covered by insurance had more physician visits. The results of the Probit Model suggest the existence of adverse selection after a regulation, since it shows that individuals with a higher number of morbidities, are more likely to buy a health insurance plan. In sum, the main findings suggest that after the regulation there are two important problems: adverse selection and moral hazard. The conjunction of these two problems may generate inefficient outcomes and might compromise the sustainability of the Brazilian private health insurance market.
100

Statistical models for estimating the intake of nutrients and foods from complex survey data

Pell, David Andrew January 2019 (has links)
Background: The consequences of poor nutrition are well known and of wide concern. Governments and public health agencies utilise food and diet surveillance data to make decisions that lead to improvements in nutrition. These surveys often utilise complex sample designs for efficient data collection. There are several challenges in the statistical analysis of dietary intake data collected using complex survey designs, which have not been fully addressed by current methods. Firstly, the shape of the distribution of intake can be highly skewed due to the presence of outlier observations and a large proportion of zero observations arising from the inability of the food diary to capture consumption within the period of observation. Secondly, dietary data is subject to variability arising from day-to-day individual variation in food consumption and measurement error, to be accounted for in the estimation procedure for correct inferences. Thirdly, the complex sample design needs to be incorporated into the estimation procedure to allow extrapolation of results into the target population. This thesis aims to develop novel statistical methods to address these challenges, applied to the analysis of iron intake data from the UK National Diet and Nutrition Survey Rolling Programme (NDNS RP) and UK national prescription data of iron deficiency medication. Methods: 1) To assess the nutritional status of particular population groups a two-part model with a generalised gamma (GG) distribution was developed for intakes that show high frequencies of zero observations. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. 2) To identify population groups at risk of low nutrient intakes, a linear quantile mixed-effects model was developed to model quantiles of the distribution of intake as a function of explanatory variables. The proposed approach was illustrated by comparing the quantiles of iron intake with Lower Reference Nutrient Intakes (LRNI) recommendations using NDNS RP. This thesis extended the estimation procedures of both the two-part model with GG distribution and the linear quantile mixed-effects model to incorporate the complex sample design in three steps: the likelihood function was multiplied by the sample weightings; bootstrap methods for the estimation of the variance and finally, the variance estimation of the model parameters was stratified by the survey strata. 3) To evaluate the allocation of resources to alleviate nutritional deficiencies, a quantile linear mixed-effects model was used to analyse the distribution of expenditure on iron deficiency medication across health boards in the UK. Expenditure is likely to depend on the iron status of the region; therefore, for a fair comparison among health boards, iron status was estimated using the method developed in objective 2) and used in the specification of the median amount spent. Each health board is formed by a set of general practices (GPs), therefore, a random intercept was used to induce correlation between expenditure from two GPs from the same health board. Finally, the approaches in objectives 1) and 2) were compared with the traditional approach based on weighted linear regression modelling used in the NDNS RP reports. All analyses were implemented using SAS and R. Results: The two-part model with GG distribution fitted to amount of iron consumed from selected episodically food, showed that females tended to have greater odds of consuming iron from foods but consumed smaller amounts. As age groups increased, consumption tended to increase relative to the reference group though odds of consumption varied. Iron consumption also appeared to be dependent on National Statistics Socio-Economic Classification (NSSEC) group with lower social groups consuming less, in general. The quantiles of iron intake estimated using the linear quantile mixed-effects model showed that more than 25% of females aged 11-50y are below the LRNI, and that 11-18y girls are the group at highest of deficiency in the UK. Predictions of spending on iron medication in the UK based on the linear quantile mixed-effects model showed areas of higher iron intake resulted in lower spending on treating iron deficiency. In a geographical display of expenditure, Northern Ireland featured the lowest amount spent. Comparing the results from the methods proposed here showed that using the traditional approach based on weighted regression analysis could result in spurious associations. Discussion: This thesis developed novel approaches to the analysis of dietary complex survey data to address three important objectives of diet surveillance, namely the mean estimation of food intake by population groups, identification of groups at high risk of nutrient deficiency and allocation of resources to alleviate nutrient deficiencies. The methods provided models of good fit to dietary data, accounted for the sources of data variability and extended the estimation procedures to incorporate the complex sample survey design. The use of a GG distribution for modelling intake is an important improvement over existing methods, as it includes many distributions with different shapes and its domain takes non-negative values. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. This also improves existing approaches that assume a zero correlation. The linear quantile mixed-effects model utilises the asymmetric Laplace distribution which can also accommodate many different distributional shapes, and likelihood-based estimation is robust to model misspecification. This method is an important improvement over existing methods used in nutritional research as it explicitly models the quantiles in terms of explanatory variables using a novel quantile regression model with random effects. The application of these models to UK national data confirmed the association of poorer diets and lower social class, identified the group of 11-50y females as a group at high risk of iron deficiency, and highlighted Northern Ireland as the region with the lowest expenditure on iron prescriptions.

Page generated in 0.0283 seconds