Spelling suggestions: "subject:"aggregate data""
1 |
Probabilistic Models for Spatially Aggregated Data / 空間集約データのための確率モデルTanaka, Yusuke 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22586号 / 情博第723号 / 新制||情||124(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授 田中 利幸, 教授 石井 信, 教授 下平 英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
2 |
The Swedish payroll tax reduction for young workers : - A study of effects found using publicly available aggregated (macro) dataBergström, Balder January 2019 (has links)
In 2007, the Swedish payroll tax was reduced for youths in an attempt to suppress the perceived high unemployment among Swedish youths. The reform was rolled back later in 2016. For this period there is a rich supply of publicly available aggregated (macro) data. This thesis aims to examine: first, if the aggregated data is suitable for policy evaluation of the reform, and second, the effects of the reform introduction and repeal. This has been done by using both a conventional fixed effects model and a more unorthodox synthetic control method. Neither of the two methods could show any unbiased and consistent significant result of the treatment effects of the reform. Instead, the results of this thesis suggest that the publicly available aggregated data doesn’t contain enough information to evaluate such reforms.
|
3 |
Improved Methods for Interrupted Time Series Analysis Useful When Outcomes are Aggregated: Accounting for heterogeneity across patients and healthcare settingsEwusie, Joycelyne E January 2019 (has links)
This is a sandwich thesis / In an interrupted time series (ITS) design, data are collected at multiple time points before and after the implementation of an intervention or program to investigate the effect of the intervention on an outcome of interest. ITS design is often implemented in healthcare settings and is considered the strongest quasi-experimental design in terms of internal and external validity as well as its ability to establish causal relationships. There are several statistical methods that can be used to analyze data from ITS studies. Nevertheless, limitations exist in practical applications, where researchers inappropriately apply the methods, and frequently ignore the assumptions and factors that may influence the optimality of the statistical analysis. Moreover, there is little to no guidance available regarding the application of the various methods, and a standardized framework for analysis of ITS studies does not exist. As such, there is a need to identify and compare existing ITS methods in terms of their strengths and limitations. Their methodological challenges also need to be investigated to inform and direct future research. In light of this, this PhD thesis addresses two main objectives: 1) to conduct a scoping review of the methods that have been employed in the analysis of ITS studies, and 2) to develop improved methods that address a major limitation of the statistical methods frequently used in ITS data analysis. These objectives are addressed in three projects.
For the first project, a scoping review of the methods that have been used in analyzing ITS data was conducted, with the focus on ITS applications in health research. The review was based on the Arksey and O’Malley framework and the Joanna Briggs Handbook for scoping reviews. A total of 1389 studies were included in our scoping review. The articles were grouped into methods papers and applications papers based on the focus of the article. For the methods papers, we narratively described the identified methods and discussed their strengths and limitations. The application papers were summarized using frequencies and percentages. We identified some limitations of current methods and provided some recommendations useful in health research.
In the second project, we developed and presented an improved method for ITS analysis when the data at each time point are aggregated across several participants, which is the most common case in ITS studies in healthcare settings. We considered the segmented linear regression approach, which our scoping review identified as the most frequently used method in ITS studies. When data are aggregated, heterogeneity is introduced due to variability in the patient population within sites (e.g. healthcare facilities) and this is ignored in the segmented linear regression method. Moreover, statistical uncertainty (imprecision) is introduced in the data because of the sample size (number of participants from whom data are aggregated). Ignoring this variability and uncertainty will likely lead to invalid estimates and loss of statistical power, which in turn leads to erroneous conclusions. Our proposed method incorporates patient variability and sample size as weights in a weighted segmented regression model. We performed extensive simulations and assessed the performance of our method using established performance criteria, such as bias, mean squared error, level and statistical power. We also compared our method with the segmented linear regression approach. The results indicated that the weighted segmented regression was uniformly more precise, less biased and more powerful than the segmented linear regression method.
In the third project, we extended the weighted method to multisite ITS studies, where data are aggregated at two levels: across several participants within sites as well as across multiple sites. The extended method incorporates the two levels of heterogeneity using weights, where the weights are defined using patient variability, sample size, number of sites as well as site-to-site variability. This extended weighted regression model, which follows the weighted least squares approach is employed to estimate parameters and perform significance testing. We conducted extensive empirical evaluations using various scenarios generated from a multi-site ITS study and compared the performance of our method with that of the segmented linear regression model as well as a pooled analysis method previously developed for multisite studies. We observed that for most scenarios considered, our method produced estimates with narrower 95% confidence intervals and smaller p-values, indicating that our method is more precise and is associated with more statistical power. In some scenarios, where we considered low levels of heterogeneity, our method and the previously proposed method showed comparable results.
In conclusion, this PhD thesis facilitates future ITS research by laying the groundwork for developing standard guidelines for the design and analysis of ITS studies. The proposed improved method for ITS analysis, which is the weighted segmented regression, contributes to the advancement of ITS research and will enable researchers to optimize their analysis, leading to more precise and powerful results. / Thesis / Doctor of Philosophy (PhD)
|
4 |
Inference for Discrete Time Stochastic Processes using Aggregated Survey DataDavis, Brett Andrew, Brett.Davis@abs.gov.au January 2003 (has links)
We consider a longitudinal system in which transitions between the states are governed by a discrete time finite state space stochastic process X. Our aim, using aggregated sample survey data of the form typically collected by official statistical agencies, is to undertake model based inference for the underlying process X. We will develop inferential techniques for continuing sample surveys of two distinct types. First, longitudinal surveys in which the same individuals are sampled in each cycle of the survey. Second, cross-sectional
surveys which sample the same population in successive cycles but with no attempt to track particular individuals from one cycle to the next. Some of the basic results have appeared in Davis et al (2001) and Davis et al (2002).¶ Longitudinal surveys provide data in the form of transition frequencies between the states of X. In Chapter Two we develop a method for modelling and estimating the one-step transition probabilities in the case where X is a non-homogeneous Markov chain and transition frequencies are observed at unit time intervals. However, due to their expense, longitudinal surveys are typically conducted at widely, and sometimes irregularly, spaced time points. That is, the observable frequencies pertain to multi-step transitions. Continuing to assume the Markov property for X, in Chapter Three, we show that these multi-step transition frequencies can be stochastically interpolated to provide accurate estimates of the one-step transition probabilities of the underlying process. These estimates for a unit time increment can be used to calculate estimates of expected future occupation time, conditional on an individuals state at initial point of observation, in the different states of X.¶ For reasons of cost, most statistical collections run by official agencies are cross-sectional sample surveys. The data observed from an on-going survey of this type are marginal frequencies in the states of X at a sequence of time points. In Chapter Four we develop a model based technique for estimating the marginal probabilities of X using data of this form. Note that, in contrast to the longitudinal case, the Markov assumption does not simplify inference based on marginal frequencies. The marginal probability estimates enable estimation of future occupation times (in each of the states of X) for an individual of unspecified initial state. However, in the applications of the technique that we discuss (see Sections 4.4 and 4.5) the estimated occupation times will be conditional on both gender and initial age of individuals.¶ The longitudinal data envisaged in Chapter Two is that obtained from the surveillance of the same sample in each cycle of an on-going survey. In practice, to preserve data quality it is necessary to control respondent burden using sample rotation. This is usually achieved using a mechanism known as rotation group sampling. In Chapter Five we consider the particular form of rotation group sampling used by the Australian Bureau of Statistics in their Monthly Labour Force Survey (from which official estimates of labour force participation rates are produced). We show that our approach to estimating the one-step transition probabilities of X from transition frequencies observed at incremental time intervals, developed in Chapter Two, can be modified to deal with data collected under this sample rotation scheme. Furthermore, we show that valid inference is possible even when the Markov property does not hold for the underlying process.
|
5 |
Avaliação microeconômica do comportamento de investidores frente às alterações de condições de mercado: os determinantes da não racionalidade dos investidores no mercado de fundos brasileirosFernandez Gonzalez, Ramon Francisco 25 May 2015 (has links)
Submitted by Ramon Francisco Fernandez Gonzalez (ragonzalez82@hotmail.com) on 2016-04-29T01:08:08Z
No. of bitstreams: 1
Versão Completa - Dissertação Ramon F F Gonzalez - Os determinantes da não racionalidade dos investidores no mercado de fundos brasileiros.pdf: 1256743 bytes, checksum: 8aee8712ff228f642b076f195caf2fce (MD5) / Approved for entry into archive by GILSON ROCHA MIRANDA (gilson.miranda@fgv.br) on 2016-05-02T13:22:46Z (GMT) No. of bitstreams: 1
Versão Completa - Dissertação Ramon F F Gonzalez - Os determinantes da não racionalidade dos investidores no mercado de fundos brasileiros.pdf: 1256743 bytes, checksum: 8aee8712ff228f642b076f195caf2fce (MD5) / Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2016-05-06T20:17:47Z (GMT) No. of bitstreams: 1
Versão Completa - Dissertação Ramon F F Gonzalez - Os determinantes da não racionalidade dos investidores no mercado de fundos brasileiros.pdf: 1256743 bytes, checksum: 8aee8712ff228f642b076f195caf2fce (MD5) / Made available in DSpace on 2016-05-09T12:18:21Z (GMT). No. of bitstreams: 1
Versão Completa - Dissertação Ramon F F Gonzalez - Os determinantes da não racionalidade dos investidores no mercado de fundos brasileiros.pdf: 1256743 bytes, checksum: 8aee8712ff228f642b076f195caf2fce (MD5)
Previous issue date: 2015-05-25 / In this paper we seek to identify the determinants of demand for mutual funds in Brazil through the logit model, which is widely used in the theory of industrial organizations. Whenever possible we perform 'links' with the main concepts of behavioral finance. Thus, we clarify the main variables that impact variations of 'market share' in the mutual funds industry. We conclude that the main indicators observed by investors at the time of decision-making, are the CDI, inflation, the real interest rate, the variation of the dollar and the stock market, on the other hand the accumulated return of the last three months is factor decisive for investors to apply or redeem an investment fund. Risk variables and expected return we thought to have a strong impact, not significant for variations of 'share'. / Neste trabalho buscamos identificar os principais determinantes da demanda por fundos de investimento no Brasil através do modelo Logit, que é bastante utilizado na teoria das organizações industriais. Sempre que possível realizamos 'links' com os principais conceitos de finanças comportamentais. Assim, conseguimos aclarar as principais variáveis que impactam as variações de 'market-share' na indústria de fundos de investimento. Concluímos que os principais indicadores observados pelos investidores no momento de tomada de decisão são o CDI, a inflação, a taxa real de juros, a variação do dólar e da bolsa de valores, por outro lado a rentabilidade acumulada dos últimos três meses é fator decisivo para que o investidor aplique ou resgate um fundo de investimento. Variáveis de risco e de retorno esperado que imaginávamos ter forte impacto, não se mostraram significativas para as variações de 'share'. / En este trabajo buscamos identificar los determinantes de la demanda de los principales fondos de inversión en Brasil através del modelo Logit, que es ampliamente utilizado en la teoría de las organizaciones industriales. Siempre que posible hemos realizado 'links' con los principales conceptos de las finanzas comportamentales. Por lo tanto, fue posible aclarar las principales variables a que las variaciones de impacto de 'cuota de mercado' en la industria de fondos de inversión. Llegamos a la conclusión de que los principales indicadores observados por los inversores en el momento de la toma de decisiones, es el CDI, la inflación, la tasa de interés real, la variación del dólar y el mercado de valores, por otro lado, la rentabilidad acumulada de los últimos tres meses es un factor decisiva para que los inversionistas invirtan o salgan de un fondo de inversión. Las variables de riesgo y rendimiento esperado que pensabamos tener un impacto fuerte, no se demonstraran significativas para las variaciones de las cuotas de mercado.
|
6 |
Improving Knowledge of Truck Fuel Consumption Using Data AnalysisJohnsen, Sofia, Felldin, Sarah January 2016 (has links)
The large potential of big data and how it has brought value into various industries have been established in research. Since big data has such large potential if handled and analyzed in the right way, revealing information to support decision making in an organization, this thesis is conducted as a case study at an automotive manufacturer with access to large amounts of customer usage data of their vehicles. The reason for performing an analysis of this kind of data is based on the cornerstones of Total Quality Management with the end objective of increasing customer satisfaction of the concerned products or services. The case study includes a data analysis exploring how and if patterns about what affects fuel consumption can be revealed from aggregated customer usage data of trucks linked to truck applications. Based on the case study, conclusions are drawn about how a company can use this type of analysis as well as how to handle the data in order to turn it into business value. The data analysis reveals properties describing truck usage using Factor Analysis and Principal Component Analysis. Especially one property is concluded to be important as it appears in the result of both techniques. Based on these properties the trucks are clustered using k-means and Hierarchical Clustering which shows groups of trucks where the importance of the properties varies. Due to the homogeneity and complexity of the chosen data, the clusters of trucks cannot be linked to truck applications. This would require data that is more easily interpretable. Finally, the importance for fuel consumption in the clusters is explored using model estimation. A comparison of Principal Component Regression (PCR) and the two regularization techniques Lasso and Elastic Net is made. PCR results in poor models difficult to evaluate. The two regularization techniques however outperform PCR, both giving a higher and very similar explained variance. The three techniques do not show obvious similarities in the models and no conclusions can therefore be drawn concerning what is important for fuel consumption. During the data analysis many problems with the data are discovered, which are linked to managerial and technical issues of big data. This leads to for example that some of the parameters interesting for the analysis cannot be used and this is likely to have an impact on the inability to get unanimous results in the model estimations. It is also concluded that the data was not originally intended for this type of analysis of large populations, but rather for testing and engineering purposes. Nevertheless, this type of data still contains valuable information and can be used if managed in the right way. From the case study it can be concluded that in order to use the data for more advanced analysis a big-data plan is needed at a strategic level in the organization. The plan summarizes the suggested solution for the managerial issues of the big data for the organization. This plan describes how to handle the data, how the analytic models revealing the information should be designed and the tools and organizational capabilities needed to support the people using the information.
|
Page generated in 0.0736 seconds