Spelling suggestions: "subject:"generalizability 1heory"" "subject:"generalizability btheory""
1 |
Estimating standard errors of estimated variance components in generalizability theory using bootstrap proceduresMoore, Joann Lynn 01 December 2010 (has links)
This study investigated the extent to which rules proposed by Tong and Brennan (2007) for estimating standard errors of estimated variance components held up across a variety of G theory designs, variance component structures, sample size patterns, and data types. Simulated data was generated for all combinations of conditions, and point estimates, standard error estimates, and coverage for three types of confidence intervals were calculated for each estimated variance component and relative and absolute error variance across a variety of bootstrap procedures for each combination of conditions. It was found that, with some exceptions, Tong and Brennan's (2007) rules produced adequate standard error estimates for normal and polytomous data, while some of the results differed for dichotomous data. Additionally, some refinements to the rules were suggested with respect to nested designs. This study provides support for the use of bootstrap procedures for estimating standard errors of estimated variance components when data are not normally distributed.
|
2 |
AN ANALYSIS OF TEST CONSTRUCTION PROCEDURES AND SCORE DEPENDABILITY OF A PARAMEDIC RECERTIFICATION EXAMde Vries, INGRID 08 September 2012 (has links)
High-stakes testing is used for the purposes of providing results that have important consequences such as certifications, licensing, or credentialing. The purpose of this study was to examine aspects of an exam recently written by flight paramedics for recertification and make recommendations for development of future exams. In 2008, an unexpectedly high failure led to revisions in the exam development process for flight paramedics. Using principles of classical test theory and generalizability theory, I examined the decision consistency and dependability of the examination and found the decision consistency for dichotomous items to be within acceptable limits, yet the dependability was low. Discrimination was strong at the cut-score. An in-depth look into the process used to set the exam, as well as the psychometric properties of the exam and the items have led to recommendations that will contribute to future development of dependable exams in the industry that result in more valid interpretations with respect to paramedic competence. / Thesis (Master, Education) -- Queen's University, 2012-09-06 22:41:41.552
|
3 |
Decomposing Variance Components for Risk Perceptions Using Generalizability TheoryWang, Yi 24 August 2017 (has links)
No description available.
|
4 |
Emotion Regulation and Religiosity: A Repeated Measures ApproachAlison M Haney (7046648) 16 October 2019 (has links)
<p>Religious faith has been identified as a
protective factor against negative psychological outcomes and is associated
with a range of positive mental and physical health outcomes. While religion is
thought to confer psychological benefits to believers in part by enhancing
emotion regulation abilities and providing faith-based regulatory methods such
as religious coping, these associations have not been examined empirically.
This may be due to a lack of measures that are appropriate for use in repeated
measures contexts, which are needed for accurate assessment of dynamic
constructs such as emotions and regulation. This study employed
generalizability theory in a sample (N = 146) collected in daily dairy format
over 21 days to determine the reliability of commonly used measures of
religiosity and religious coping at the daily level. Once reliability was
established, varying time scales were used in a multilevel modeling framework
to examine the associations among intrinsic religiosity, religious coping,
positive and negative affect, and difficulties in emotion regulation. Positive
religious coping (PRC) measured at baseline, same day, and a 1-day lag was
associated with higher levels of daily positive affect, though PRC was also
associated with negative affect when measured on the same day. Negative
religious coping (NRC) measured at baseline predicted lower levels of daily
positive affect and was associated with higher levels of negative affect when
measured on the same day and a 1-day lag. NRC was also associated with higher
levels of difficulties in emotion regulation at all measurement periods, though
PRC and intrinsic religiosity were not significantly associated with emotion
regulation difficulties. While not associated with daily positive or negative
affect, intrinsic religiosity was found to enhance the effect of positive
affect inertia. These results did not support the conceptualization that
religiosity broadly promotes adaptive emotion regulation, but rather that
intrinsic religiosity may increase positive affect by amplifying the effects of positive affect inertia.
Additional work is needed with increased measurement occasions to fully
understand the temporal associations among these constructs.</p>
|
5 |
An Investigation of the Parenting Stress Index in the Context of Generalizability TheorySharpnack, Jim D. 01 May 1997 (has links)
This present study examined the application of generalizability theory (GT) to the Parenting Stress Index (PSI) long and short forms for families having children with disabilities. The purpose of the study was to evaluate the dependability of parenting stress data scores gathered from families having children with disabilities. The data for the present study came from an extant data set collected by the Early Intervention Research Institute (EIRI; Contract #800-85-0173) at Utah State University. The EIRI studies represented attempts to assess the benefits and cost of conducting early intervention programs. The EIRI data were recoded at the item level for the Psychometrics Project, which established norms, reliability, and validity information on self-report, family-functioning measures gathered from families having children with disabilities.
The GT study results suggested that the items facet made a large contribution, indicating that there may not be any established trends in item responses. An explanation for the items facet indicates that the PSI forms provide an accurate measure of overall parental stress. According to the times facet results, the effects of time are minimal except the increase between occasion one to occasion two. Classical reliability theory (CRT) and GT analyses provide contradictory results, probably due to GT's multiple error source analyses compared to CRT's examination of a single error source in one analysis.
GT study analyses indicate that the highest g and phi coefficients are produced with the highest number of administrations and items. However, administering the highest number of administrations and items would be impractical within any setting. The original number of items from the Parent Domain, Child Domain, and short PSI total score should be administered twice to increase the dependability of scores and still fall within practical limitations.
A researcher and/or practitioner may want information to decide what form, long or short, to choose. If the PSI is to be used as a quick screening tool or as one test in a complete assessment, the short form may be of more use. If the PSI is to be used as a primary source of information about parent and child interactive systems, the long PSI version would be recommended.
|
6 |
Interrater Agreement and Reliability of Observed Behaviors: Comparing Percentage Agreement, Kappa, Correlation Coefficient, ICC and G TheoryCao, Qian 02 October 2013 (has links)
The study of interrater agreement and itnerrater reliability attract extensive attention, due to the fact that the judgments from multiple raters are subjective and may vary individually. To evaluate interrater agreement and interrater reliability, five different methods or indices are proposed: percentage of agreement, kappa coefficient, the correlation coefficient, intraclass correlation coefficient (ICC), and generalizability (G) theory.
In this study, we introduce and discuss the advantages and disadvantages of these methods to evaluate interrater agreement and reliability. Then we review and explore the rank across these five indices by use of frequency in practice in the past five years. Finally, we illustrate how to use these five methods under different circumstances and provide SPSS and SAS code to analyze interrater agreement and reliability.
We apply the methods above to analyze the data from Parent-Child Interaction System of global ratings (PARCHISY), and conclude as follows: (1) ICC is the most often used method to evaluate interrater reliability in recent five years, while generalizability theory is the least often used method. The G coefficients provide similar interrater reliability with weighted kappa and ICC on most items, based on the criteria. (2) When the reliability is high itself, different methods provide consistent indication on interrater reliability based on different criteria. If the reliability is not consistent among different methods, both ICC and G coefficient will provide better interrater reliability based on the criteria, and they also provide consistent results.
|
7 |
Assessment and Reporting of Intercoder Reliability in Published Meta-Analyses Related to Preschool Through Grade 12 EducationRaffle, Holly 10 October 2006 (has links)
No description available.
|
8 |
Social support, mood, and relationship satisfaction at the trait and social levelsWilliamson, J Austin 01 July 2015 (has links)
Many social processes influence the amount, quality, and availability of support from an individual's social network. Trait influences are characteristics of the individual that generalize across relationships and affect how much support is received and perceived on average from other people. Social influences comprise characteristics of the individual's social network. They are relationship specific and account for the variability in supportiveness among an individual's providers. Recent studies have taken a multilevel approach to studying social support in order to partition the variance in sets of relationship-specific support measures into trait and social components. These studies have also used multivariate generalizability (G) theory to examine the correlations between social support and other constructs, such as negative mood, at the trait and social level.
These multilevel studies have begun to clarify the relative contributions of trait and social influences on social support, but much is yet to be learned about the nature and measurement of social support's trait and social components. One set of aims within this project was to identify characteristics of support recipients and characteristics of support providers that were related to the reception and perception of social support. Another set of aims focused on validating the measurement strategies used by G theory researchers and understanding how the trait and social components of support and mood derived from relationship-specific measures relate to traditional measures of these constructs. My final set of aims involved the application of multilevel analyses of social support and negative mood to three existing theories in the social support literature--the buffering hypothesis, the matching hypothesis, and the platinum rule.
The participants in this study comprised two samples--one group of 755 undergraduate psychology students, and one group of 430 community members from across the United States. Participants completed measures of their personality traits, recent depressive symptoms, recent experiences of life adversity and perceived control over life adversity. They also reported on three close relationships including support from those relationships, satisfaction with those relationships, and mood experienced when interacting with those three people.
Several multilevel analyses were used in the study. Univariate G theory analyses were used to quantify the relative variance in support, mood, and relationship satisfaction attributable to trait and social influences. Multivariate G theory analyses were used to estimate the links between these variables at the trait and social levels of analysis. Mixed effects models were used to identify trait and relationship-specific constructs that that might partly constitute the trait and social influences on social support. Multilevel Structural Equation Modeling (SEM) was used to evaluate the validity of several constructs employed in previous multilevel studies on social support. Finally, mixed effects and multivariate G theory analyses were used to test the buffering hypothesis, the matching hypothesis, and the platinum rule.
Consistent with previous multilevel studies of social support, recipients who received more support, on average, from their social networks also reported more negative mood when interacting with their providers. After taking those average tendencies into account, the amount of support received from an individual support provider was not associated with negative mood experienced when with that provider. The investigation of the trait influences on social support showed that recipients who were younger, more extraverted, and more open to new experiences tended to receive more social support. Women tended to receive more support than men. With respect to social influences, romantic partners tended to provide the most support whereas friends and siblings provided significantly less support on average. Women tended to provide more support than men. The validity assessment showed that the social component of support availability was only modestly distinct from the social component of generic relationship satisfaction. The trait component of support availability showed good discriminant validity from relationship satisfaction and good convergent validity with global support availability. The trait component of relationship-specific mood showed moderate convergent validity with general mood. The buffering and matching hypotheses were not supported by my findings. The platinum rule was supported at the trait level in that recipients who reported greater support adequacy, on average, tended to report more positive mood and less negative mood. The platinum rule was also supported at the social level in that recipients tended to report experiencing the most positive mood and least negative mood when interacting with individual providers who tended to supply the most adequate support.
|
9 |
Competency Assessment in Nursing Using Simulation: A Generalizability Study and Scenario Validation ProcessJanuary 2014 (has links)
abstract: The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in the Measuring Competency with Simulation (MCWS) Phase I dataset. The relative magnitudes of various error sources and their interactions were estimated in a generalizability study involving a fully crossed, three-facet random design with nurse participants as the object of measurement and scenarios, raters, and items as the three facets. A design corresponding to that of the MCWS Phase I data--involving three scenarios, three raters, and 41 items--showed nurse participants contributed the greatest proportion to total variance (50.00%), followed, in decreasing magnitude, by: rater (19.40%), the two-way participant x scenario interaction (12.93%), and the two-way participant x rater interaction (8.62%). The generalizability (G) coefficient was .65 and the dependability coefficient was .50. In decision study designs minimizing number of scenarios, the desired generalizability coefficients of .70 and .80 were reached at three scenarios with five raters, and five scenarios with nine raters, respectively. In designs minimizing number of raters, G coefficients of .72 and .80 were reached at three raters and five scenarios and four raters and nine scenarios, respectively. A dependability coefficient of .71 was attained with six scenarios and nine raters or seven raters and nine scenarios. Achieving high reliability with designs involving fewer raters may be possible with enhanced rater training to decrease variance components for rater main and interaction effects. The second part of this study involved the design and implementation of a validation process for evidence-based human patient simulation scenarios in assessment of nursing competency. A team of experts validated the new scenario using a modified Delphi technique, involving three rounds of iterative feedback and revisions. In tandem, the psychometric study of the NPP and the development of a validation process for human patient simulation scenarios both advance and encourage best practices for studying the validity of simulation-based assessments. / Dissertation/Thesis / Doctoral Dissertation Educational Psychology 2014
|
10 |
Generalizability of Universal Screening Measures for Behavioral and Emotional RiskTanner, Nicholas Andrew, Tanner, Nicholas Andrew January 2017 (has links)
Data derived from universal screening procedures are increasingly utilized by schools to identify and provide additional supports to students at-risk of behavioral and emotional concerns. As screening has the potential to be resource intensive, effort has been placed on the development of efficient screening procedures, namely brief behavior rating scales. This study utilized classical test theory and generalizability theory to examine the extent to which differences among students, raters, occasions, and screening measures affect the meaningfulness of data derived from universal screening procedures. Teacher pairs from three middle school classrooms completed two brief behavior rating scales during fall and spring screening administrations for all students in their respective classrooms. Correlation coefficients examining interrater reliability, test-retest reliability, and concurrent validity were generally strong. Generalizability analyses indicated that the majority of variance in teacher ratings were attributable to student differences across all score comparisons, but differences between teacher ratings for particular students accounted for relatively large percentages of error variance among student behavior ratings. Although decision studies showed that increasing the number of screening occasions resulted in more generalizable data, the impact of increasing the number of raters resulted in more efficient screening procedures.
|
Page generated in 0.0841 seconds