Spelling suggestions: "subject:"hierarchical data"" "subject:"hierarchical mata""
1 |
Causal Inference Using Propensity Score Matching in Clustered DataOelrich, Oscar January 2014 (has links)
Propensity score matching is commonly used to estimate causal effects of treatments. However, when using data with a hierarchical structure, we need to take the multilevel nature of the data into account. In this thesis the estimation of propensity scores with multilevel models is presented to extend propensity score matching for use with multilevel data. A Monte Carlo simulation study is performed to evaluate several different estimators. It is shown that propensity score estimators ignoring the multilevel structure of the data are biased, while fixed effects models produce unbiased results. An empirical study of the causal effect of truancy on mathematical ability for Swedish 9th graders is also performed, where it is shown that truancy has a negative effect on mathematical ability.
|
2 |
Realization Methods for the Quadtree Morphological Filter with Their ApplicationsChen, Yung-lin 07 September 2011 (has links)
Quadtree algorithm and morphological image processing are combined in the proposed method in this paper. A new method is proposed to improve the previous pattern mapping method for faster processing.
The previous pattern mapping method is a pattern mapping method by storing the tree pattern by string form, which is a pointless data structure. In the proposed method the tree pattern is saved in a point data structure. Therefore, the pointer tree can be applied to the quadtree immediately without the transforming time, which was required in the previous pattern mapping method.
In this paper, the pointless quad tree work is modified to pointer quad tree to reduce the processing time. The modified algorithm is applied to circuit detection, image restoration, image segmentation and cell counting.
|
3 |
Three Essays on Correlated Binary Outcomes: Detection and Appropriate ModelsJanuary 2018 (has links)
abstract: Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children. / Dissertation/Thesis / Doctoral Dissertation Statistics 2018
|
4 |
THE USE OF HDF IN F-22 AVIONICS TEST AND EVALUATIONBarnum, Jil 10 1900 (has links)
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California / Hierarchical Data Format (HDF) is a public domain standard for file formats which is
documented and maintained by the National Center for Super Computing Applications.
HDF is the standard adopted by the F-22 program to increase efficiency of avionics data
processing and utility of the data. This paper will discuss how the data processing
Integrated Product Team (IPT) on the F-22 program plans to use HDF for file format
standardization. The history of the IPT choosing HDF, the efficiencies gained by choosing
HDF, and the ease of data transfer will be explained.
|
5 |
A Spreadsheet Model for Using Web Services and Creating Data-Driven ApplicationsChang, Kerry Shih-Ping 01 April 2016 (has links)
Web services have made many kinds of data and computing services available. However, to use web services often requires significant programming efforts and thus limits the people who can take advantage of them to only a small group of skilled programmers. In this dissertation, I will present a tool called Gneiss that extends the spreadsheet model to support four challenging aspects of using web services: programming two-way data communications with web services, creating interactive GUI applications that use web data sources, using hierarchical data, and using live streaming data. Gneiss contributes innovations in spreadsheet languages, spreadsheet user interfaces and interaction techniques to allow programming tasks that currently require writing complex, lengthy code to instead be done using familiar spreadsheet mechanisms. Spreadsheets are arguably the most successful and popular data tools among people of all programming levels. This work advances the use of spreadsheets to new domains and could benefit a wide range of users from professional programmers to end-user programmers.
|
6 |
Administration Service for the Tourist Information System (TIP)Hsieh, Ping-Ju January 2008 (has links)
The modern day tourists do not want to deal with the hassle of using a large number of travel guides and paper maps while travelling. They would prefer to be able to access required information via their mobile phones or Personal Digital Assistants (PDAs). We realise that the delivered information may be originally available in numerous information formats. To support the administrator of the tourist guides the programme is required to help sorting information from these different sources and to help inserting them into a system. Our goal with this project is to develop a software support for processing information import via a graphical user interface, to support the administrator in identifying and extracting the appropriate sight information from various resources. The interface also helps in transferring and storing the structured and unstructured data into the TIP database.
|
7 |
Three Essays on Comparative Simulation in Three-level Hierarchical Data StructureJanuary 2017 (has links)
abstract: Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions and large sample theory. However, when dealing with small hierarchical datasets they often result in severe bias or non-convergence. I present a generalized quasi-likelihood approach and a generalized method of moments approach; both do not rely on any distributional assumptions but only moments of response. As an alternative to the typical large sample theory approach, I present bootstrapping hierarchical logistic regression models which provides more accurate interval estimates for small binary hierarchical data. These models substitute computations as an alternative to the traditional Wald-type and profile likelihood confidence intervals. I use a latent variable approach with a new split bootstrap method for estimating intraclass correlation coefficients when analyzing binary data obtained from a three-level hierarchical structure. It is especially useful with small sample size and easily expanded to multilevel. Comparisons are made to existing approaches through both theoretical justification and simulation studies. Further, I demonstrate my findings through an analysis of three numerical examples, one based on cancer in remission data, one related to the China’s antibiotic abuse study, and a third related to teacher effectiveness in schools from a state of southwest US. / Dissertation/Thesis / Doctoral Dissertation Statistics 2017
|
8 |
Variable selection in joint modelling of mean and variance for multilevel dataCharalambous, Christiana January 2011 (has links)
We propose to extend the use of penalized likelihood based variable selection methods to hierarchical generalized linear models (HGLMs) for jointly modellingboth the mean and variance structures. We are interested in applying these newmethods on multilevel structured data, hence we assume a two-level hierarchical structure, with subjects nested within groups. We consider a generalized linearmixed model (GLMM) for the mean, with a structured dispersion in the formof a generalized linear model (GLM). In the first instance, we model the varianceof the random effects which are present in the mean model, or in otherwords the variation between groups (between-level variation). In the second scenario,we model the dispersion parameter associated with the conditional varianceof the response, which could also be thought of as the variation betweensubjects (within-level variation). To do variable selection, we use the smoothlyclipped absolute deviation (SCAD) penalty, a penalized likelihood variable selectionmethod, which shrinks the coefficients of redundant variables to 0 and at thesame time estimates the coefficients of the remaining important covariates. Ourmethods are likelihood based and so in order to estimate the fixed effects in ourmodels, we apply iterative procedures such as the Newton-Raphson method, inthe form of the LQA algorithm proposed by Fan and Li (2001). We carry out simulationstudies for both the joint models for the mean and variance of the randomeffects, as well as the joint models for the mean and dispersion of the response,to assess the performance of our new procedures against a similar process whichexcludes variable selection. The results show that our method increases both theaccuracy and efficiency of the resulting penalized MLEs and has 100% successrate in identifying the zero and non-zero components over 100 simulations. Forthe main real data analysis, we use the Health Survey for England (HSE) 2004dataset. We investigate how obesity is linked to several factors such as smoking,drinking, exercise, long-standing illness, to name a few. We also discover whetherthere is variation in obesity between individuals and between households of individuals,as well as test whether that variation depends on some of the factorsaffecting obesity itself.
|
9 |
A Novel Report Generation Approach For Medical Applications: The Sisds Methodology And Its ApplicationsKuru, Kaya 01 February 2010 (has links) (PDF)
In medicine, reliable data are available only in a few areas and necessary information on
prognostic implications is generally missing. In spite of the fact that a great amount of money
has been invested to ease the process, an effective solution has yet to be found. Unfortunately,
existing data collection approaches in medicine seem inadequate to provide accurate and high
quality data, which is a prerequisite for building a robust and effective DDSS. In this thesis,
many different medical reporting methodologies and systems which have been used up to
now are evaluated / their strengths and deficiencies are revealed to shed light on how to set
up an ideal medical reporting type. This thesis presents a new medical reporting method,
namely &ldquo / Structured, Interactive, Standardized and Decision Supporting Method&rdquo / (SISDS) that
encompasses most of the favorable features of the existing medical reporting methods while
removing most of their deficiencies such as inefficiency and cognitive overload as well as
introducing and promising new advantages. The method enables professionals to produce
multilingual medical reports much more efficiently than the existing approaches in a novel
way by allowing free-text-like data entry in a structured form. The proposed method in this
study is proved to be more effective in many perspectives, such as facilitating the complete
and the accurate data collection process and providing opportunities to build DDSS without
tedious pre-processing and data preparation steps, mainly helping health care professionals practice better medicine.
|
10 |
Optimalizace distribuovaného I/O subsystému projektu k-Wave / Optimization of the Distributed I/O Subsystem of the k-Wave ProjectVysocký, Ondřej January 2016 (has links)
This thesis deals with an effective solution of the parallel I/O of the k-Wave tool, which is designed for time domain acoustic and ultrasound simulations. k-Wave is a supercomputer application, it runs on a Lustre file system and it requires to be implemented with MPI and stores the data in suitable data format (HDF5). I designed three methods of optimization which fits k-Wave's needs. It uses accumulation and redistribution techniques. In comparison with the native write, every optimization method led to better write speed, up to 13.6GB/s. It is possible to use these methods to optimize every data distributed application with the write speed issue.
|
Page generated in 0.0787 seconds