Global ETD Search

1	Statistical analysis of grouped data Crafford, Gretel 01 July 2008 (has links) The maximum likelihood (ML) estimation procedure of Matthews and Crowther (1995: A maximum likelihood estimation procedure when modelling in terms of constraints. South African Statistical Journal, 29, 29-51) is utilized to fit a continuous distribution to a grouped data set. This grouped data set may be a single frequency distribution or various frequency distributions that arise from a cross classification of several factors in a multifactor design. It will also be shown how to fit a bivariate normal distribution to a two-way contingency table where the two underlying continuous variables are jointly normally distributed. This thesis is organized in three different parts, each playing a vital role in the explanation of analysing grouped data with the ML estimation of Matthews and Crowther. In Part I the ML estimation procedure of Matthews and Crowther is formulated. This procedure plays an integral role and is implemented in all three parts of the thesis. In Part I the exponential distribution is fitted to a grouped data set to explain the technique. Two different formulations of the constraints are employed in the ML estimation procedure and provide identical results. The justification of the method is further motivated by a simulation study. Similar to the exponential distribution, the estimation of the normal distribution is also explained in detail. Part I is summarized in Chapter 5 where a general method is outlined to fit continuous distributions to a grouped data set. Distributions such as the Weibull, the log-logistic and the Pareto distributions can be fitted very effectively by formulating the vector of constraints in terms of a linear model. In Part II it is explained how to model a grouped response variable in a multifactor design. This multifactor design arise from a cross classification of the various factors or independent variables to be analysed. The cross classification of the factors results in a total of T cells, each containing a frequency distribution. Distribution fitting is done simultaneously to each of the T cells of the multifactor design. Distribution fitting is also done under the additional constraints that the parameters of the underlying continuous distributions satisfy a certain structure or design. The effect of the factors on the grouped response variable may be evaluated from this fitted design. Applications of a single-factor and a two-factor model are considered to demonstrate the versatility of the technique. A two-way contingency table where the two variables have an underlying bivariate normal distribution is considered in Part III. The estimation of the bivariate normal distribution reveals the complete underlying continuous structure between the two variables. The ML estimate of the correlation coefficient ρ is used to great effect to describe the relationship between the two variables. Apart from an application a simulation study is also provided to support the method proposed. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2007. / Statistics / unrestricted Grouped data set UCTD
2	The Relationship Between the Mean, Median and Mode with Unimodal Grouped Data Zheng, Shimin, Mogusu, Eunice, Veeranki, Sreenivas P., Quinn, Megan, Cao, Yan 16 May 2016 (has links) It is widely believed that the median is “usually” between the mean and the mode for skewed unimodal distributions. However, this inequality is not always true, especially with grouped data. Unavailability of complete raw data further necessitates the importance of evaluating this characteristic in grouped data. There is a gap in the current statistical literature on assessing mean–median–mode inequality for grouped data. The study aims to evaluate the relationship between the mean, median, and mode with unimodal grouped data; derive conditions for their inequalities; and present their application. mean median mode unimodal grouped data asymmetric unimodal distribution grouped data Biostatistics and Epidemiology Biostatistics Epidemiology
3	Estimation and inference of microeconometric models based on moment condition models Khatoon, Rabeya January 2014 (has links) The existing estimation techniques for grouped data models can be analyzed as a class of estimators of instrumental variable-Generalized Method of Moments (GMM) type with the matrix of group indicators being the set of instruments. Econometric literature (e.g. Smith, 1997; Newey and Smith, 2004) show that, in some cases of empirical relevance, GMM can have shortcomings in terms of the large sample behaviour of the estimator being different from the finite sample properties. Generalized Empirical Likelihood (GEL) estimators are developed that are not sensitive to the nature and number of instruments and possess improved finite sample properties compared to GMM estimators. In this thesis, with the assumption that the data vector is iid within a group, but inid across groups, we developed GEL estimators for grouped data model having population moment conditions of zero mean of errors in each group. First order asymptotic analysis of the estimators show that they are √N consistent (N being the sample size) and normally distributed. The thesis explores second order bias properties that demonstrate sources of bias and differences between choices of GEL estimators. Specifically, the second order bias depends on the third moments of the group errors and correlation among the group errors and explanatory variables. With symmetric errors and no endogeneity all three estimators Empirical Likelihood (EL), Exponential Tilting (ET) and Continuous Updating Estimator (CUE) yield unbiased estimators. A detailed simulation exercise is performed to test comparative performance of the EL, ET and their bias corrected estimators to the standard 2SLS/GMM estimators. Simulation results reveal that while, with a few strong instruments, we can simply use 2SLS/GMM estimators, in case of many and/or weak instruments, increased degree of endogeneity, or varied signal to noise ratio, bias corrected EL, ET estimators dominate in terms of both least bias and accurate coverage proportions of asymptotic confidence intervals even for a considerably large sample. The thesis includes a case where there are within group dependent data, to assess the consequences of a key assumption being violated, namely the within-group iid assumption. Theoretical analysis and simulation results show that ignoring this feature can result in misleading inference. The proposed estimators are used to estimate the returns to an additional year of schooling in the UK using Labour Force Survey data over 1997-2009. Pooling the 13 years data yields roughly the same estimate of 11.27% return for British-born men aged 25-50 using any of the estimation techniques. In contrast using 2009 LFS data only, for a relatively small sample and many weak instruments, the return to first degree holder men is 13.88% using EL bias corrected estimator, where 2SLS estimator yields an estimate of 6.8%. 338.5
4	Relationship Between Mean, Median, Mode with Unimodal Grouped Data Zheng, Shimin, Mogusu, Eunice, Veeranki, Sreenivas P., Quinn, Megan 03 November 2015 (has links) Background: It is widely believed that the median of a unimodal distribution is "usually" between the mean and the mode for right skewed or left skewed distributions. However, this is not always true, especially with grouped data. For some research, analyses must be conducted based on grouped data since complete raw data are not always available. A gap exists in the body of research on the mean-median-mode inequality for grouped data. Methods: For grouped data, the median Me=L+((n/2-F)/fm)×d and the mode Mo=L+(D1/(D1+D2))×d, where L is the median/modal group lower boundary, n is the total frequency, F and G are the cumulative frequencies of the groups before and after the median/modal group respectively, D1= fm - fm-1 and D2=fm - fm+1, fmis the median/modal group frequency, fm-1 and fm+1 are the premodal and postmodal group frequency respectively. Assuming there are k groups and k is odd, group width d is the same for each group and the mode and median are within (k+1)/2th group. Necessary and sufficient conditions are derived for each of six arrangements of mean, median and mode. Results: Table available at https://apha.confex.com/apha/143am/webprogram/Paper326538.html Conclusion: For grouped data, the mean-median-mode inequality can be any order of six possibilities. relationship mean median mode unimodal grouped data Biostatistics and Epidemiology Biostatistics Public Health
5	The Asymptotic Loss of Information for Grouped Data Felsenstein, Klaus, Pötzelberger, Klaus January 1995 (has links) (PDF) We study the loss of information (measured in terms of the Kullback- Leibler distance) caused by observing "grouped" data (observing only a discretized version of a continuous random variable). We analyse the asymptotical behaviour of the loss of information as the partition becomes finer. In the case of a univariate observation, we compute the optimal rate of convergence and characterize asymptotically optimal partitions (into intervals). In the multivariate case we derive the asymptotically optimal regular sequences of partitions. Forthermore, we compute the asymptotically optimal transformation of the data, when a sequence of partitions is given. Examples demonstrate the efficiency of the suggested discretizing strategy even for few intervals. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
6	On Learning from Collective Data Xiong, Liang 01 December 2013 (has links) In many machine learning problems and application domains, the data are naturally organized by groups. For example, a video sequence is a group of images, an image is a group of patches, a document is a group of paragraphs/words, and a community is a group of people. We call them the collective data. In this thesis, we study how and what we can learn from collective data. Usually, machine learning focuses on individual objects, each of which is described by a feature vector and studied as a point in some metric space. When approaching collective data, researchers often reduce the groups into vectors to which traditional methods can be applied. We, on the other hand, will try to develop machine learning methods that respect the collective nature of data and learn from them directly. Several different approaches were taken to address this learning problem. When the groups consist of unordered discrete data points, it can naturally be characterized by its sufficient statistics – the histogram. For this case we develop efficient methods to address the outliers and temporal effects in the data based on matrix and tensor factorization methods. To learn from groups that contain multi-dimensional real-valued vectors, we develop both generative methods based on hierarchical probabilistic models and discriminative methods using group kernels based on new divergence estimators. With these tools, we can accomplish various tasks such as classification, regression, clustering, anomaly detection, and dimensionality reduction on collective data. We further consider the practical side of the divergence based algorithms. To reduce their time and space requirements, we evaluate and find methods that can effectively reduce the size of the groups with little impact on the accuracy. We also proposed the conditional divergence along with an efficient estimator in order to correct the sampling biases that might be present in the data. Finally, we develop methods to learn in cases where some divergences are missing, caused by either insufficient computational resources or extreme sampling biases. In addition to designing new learning methods, we will use them to help the scientific discovery process. In our collaboration with astronomers and physicists, we see that the new techniques can indeed help scientists make the best of data.
7	How does monetary policy affect income inequality in Japan? Evidence from grouped data Feldkircher, Martin, Kakamu, Kazuhiko January 2018 (has links) (PDF) We examine the effects of monetary policy on income inequality in Japan using a novel econometric approach that jointly estimates the Gini coefficient based on micro-level grouped data of households and the dynamics of macroeconomic quantities. Our results indicate different effects on income inequality for different types of households: A monetary tightening increases inequality when income data is based on households whose head is employed (workers' households), while the effect reverses over the medium term when considering a broader definition of households. Differences in the relative strength of the transmission channels can account for this finding. Finally we demonstrate that the proposed joint estimation strategy leads to more informative inference while results based on the frequently used two-step estimation approach yields inconclusive results. / Series: Working Papers in Regional Science JEL C30, E52, F41, E32
8	The Relationship Between the Mean, Median, and Mode with Grouped Data Zheng, Shimin, Mogusu, Eunice, Veeranki, Sreenivas P., Quinn, Megan, Cao, Yan 03 May 2016 (has links) It is widely believed that the median is “usually” between the mean and the mode for skewed unimodal distributions. However, this inequality is not always true, especially with grouped data. Unavailability of complete raw data further necessitates the importance of evaluating this characteristic in grouped data. There is a gap in the current statistical literature on assessing mean–median–mode inequality for grouped data. The study aims to evaluate the relationship between the mean, median, and mode with unimodal grouped data; derive conditions for their inequalities; and present their application. 62E99 asymmetric unimodal distribution grouped data mean median mode Biostatistics and Epidemiology Biostatistics
9	MAKING A GROUPED-DATA FREQUENCY TABLE: DEVELOPMENT AND EXAMINATION OF THE ITERATION ALGORITHM Lohaka, Hippolyte O. January 2007 (has links) No description available. Education, Tests and Measurements Iteration algorithm grouped data frequency table Monte Carlo simulations MANCOVA ANCOVA
10	Comparing measures of fit for circular distributions Sun, Zheng 04 May 2010 (has links) This thesis shows how to test the fit of a data set to a number of different models, using Watson’s U2 statistic for both grouped and continuous data. While Watson’s U2 statistic was introduced for continuous data, in recent work, the statistic has been adapted for grouped data. However, when using Watson’s U2 for continuous data, the asymptotic distribution is difficult to obtain, particularly, for some skewed circular distributions that contain four or five parameters. Until now, U2 asymptotic points are worked out only for uniform distribution and the von Mises distribution among all circular distributions. We give U2 asymptotic points for the wrapped exponential distributions, and we show that U2 asymptotic points when data are grouped is usually easier to obtain for other more advanced circular distributions. In practice, all continuous data is grouped into cells whose width is decided by the accuracy of the measurement. It will be found useful to treat such data as grouped with sufficient number of cells in the examples to be analyzed. When the data are treated as grouped, asymptotic points for U2 match well with the points when the data are treated as continuous. Asymptotic theory for U2 adopted for grouped data is given in the thesis. Monte Carlo studies show that, for reasonable sample sizes, the asymptotic points will give good approximations to the p-values of the test. Goodness-of-fit Watson's Statistic Circular distributions Maximum likelihood estimation grouped data parametric bootstrap

Search results