Spelling suggestions: "subject:"test validity"" "subject:"est validity""
1 |
The Relationship Between Viewing Time and Sexual Attraction RatingsRees, Micah James 01 June 2019 (has links)
The LOOK is an iPad-based application that measures sexual interest. It does this by recording the amount of time individuals take to view and rate the attractiveness of images of fully clothed people from differing age, gender, and racial demographics. Viewing-time measures, such as the LOOK, operate under the assumption that individuals view sexually attractive images longer than they view images that they deem unattractive or sexually non-preferred. Although there is research to show the efficacy of these kinds of tests, there is a lack of research supporting the assumption that viewing-time correlates strongly with reported ratings of sexual preferences. This study analyzed existing data from the LOOK to assess the nature of this correlation and how it varies across gender groups. The results of this analysis found that a moderately sized correlation did exist between time spent rating the image (Rate-time) and the subsequent rating of sexual attraction (Ratings) in most age and gender categories. However, for both men and women, these correlations were significantly weaker or were negative in target categories (those categories in which they rated the highest amount of sexual attraction). Additionally, cluster analysis indicated two clusters within both the male and female participant groups that had significantly different mean Rate-time, mean Ratings, and correlation coefficients. Given these results, the viewing-time theory that Rate-time is strongly associated with sexual attraction is questionable. A greater understanding of what viewing-time measures truly assess will require additional research.
|
2 |
Test Validity and Statistical AnalysisSargsyan, Alex 17 September 2018 (has links)
No description available.
|
3 |
Tradução e adaptação cultural do instrumento Global Appraisal of Individual Needs - INITIAL / Translation and cultural adaptation of the instrument Global Appraisal of Individual Needs - INITIALClaro, Heloísa Garcia 17 December 2010 (has links)
Este estudo objetivou realizar a tradução e adaptação cultural do instrumento Global Appraisal of Individual Needs - Initial, e calcular seu Índice de Validade de Conteúdo. Foi realizado um estudo do tipo metodológico, que obedeceu aos procedimentos internacionais recomendados pela literatura. O instrumento foi traduzido para o português em duas versões, que foram analisadas e deram origem à síntese das traduções, sendo esta submetida à avaliação de um Comitê de quatro juízes doutores especialistas na área de álcool e outras drogas. Após a incorporação das sugestões desses juízes ao instrumento, este foi retrotraduzido, sua versão em português e sua retrotradução para o inglês foram ressubmetidas à avaliação dos juízes e também dos desenvolvedores do instrumento original, sofrendo novamente alterações que resultaram na versão final do instrumento, o Avaliação Global das Necessidades Individuais - Inicial. O Índice de Validade de Conteúdo do instrumento foi de 0,91, considerado válido pela literatura. Concluiu-se que o instrumento Avaliação Global das Necessidades Individuais - Inicial é um instrumento adaptado culturalmente para o português falado no Brasil, entretanto o instrumento não foi submetido a testes com a população-alvo, o que sugere que sejam realizados estudos futuros que testem sua confiabilidade e validade. / This study aimed to translate and culturally adapt the instrument Global Appraisal of Individual Needs - Initial, and calculate its Content Validity Index. We conducted a study of methodological type, following the procedures recommended by the international literature. The instrument was translated into Portuguese in two versions, which were analyzed and originated the synthesis of the translations, which was evaluated by a committee of four judges with PHD degree, specialists in the field of alcohol and other drugs. After incorporating the suggestions of those judges to the instrument, it was backtranslated, and its Portuguese version and its back-translated into English were re-submitted to the same judges and also the developers of the original instrument, undergoing changes again, resulting in the final version of the instrument the Global Assessment of Individual Needs - Home. The Content Validity Index of the instrument was 0.91, considered valid by the literature. We concluded that the Global Assessment of Individual Needs - Initial is a culturally adapted instrument for the Portuguese spoken in Brazil, however, the instrument has not been tested with the target population yet, which suggests future researches to test its reliability and validity.
|
4 |
MMPI-2-RF UNDERREPORTING VALIDITY SCALES IN FIREFIGHTER APPLICANTS: A CROSS-VALIDATION STUDYBalthrop, Kullen Charles 01 January 2018 (has links)
The identification of potential underreporting in employment evaluations is important to consider when examining a measure’s validity. This importance increases in personnel selection involving high-virtue positions (e.g., police officers and firefighters). The current study aimed to utilize an archival firefighter applicant sample to examine the construct validity of the Minnesota Multiphasic Personality Inventory-2-Restructured Form’s (MMPI-2-RF) underreporting scales (L-r and K-r). Results were analyzed using a correlation matrix comprised of a modified version of the Multi-Trait Multi-Method Matrix (MTMM), as well as multiple regression and partial correlation. The present study provides additional support for the construct validity of the MMPI-2-RF’s underreporting validity scales. Further research using outcome measures and alternate assessment methods would be able to provide further information on the efficacy of these scales.
|
5 |
New approaches to measuring emotional intelligenceMacCann, Carolyn Elizabeth January 2006 (has links)
Doctor of Philosophy (PhD) / New scoring and test construction methods for emotional intelligence (EI) are suggested as alternatives for current practice, where most tests are scored by group judgment and are in ratings-based format. Both the ratings-based format and the proportion-based scores resulting from group judgments may act as method effects, obscuring relationships between EI tests, and between EI and intelligence. In addition, scoring based on standards rather than group judgments add clarity to the meaning of test scores. For these reasons, two new measures of emotional intelligence (EI) are constructed: (1) the Situational Test of Emotional Understanding (STEU); and (2) the Situational Test of Emotion Management (STEM). Following test construction, validity evidence is collected from four multi-variate studies. The STEU’s items and a standards-based scoring system are developed according to empirically derived appraisal theory concerning the structure of emotion [Roseman, 2001]. The STEM is developed as a Situational Judgment Test (SJT) with situations representing sadness, fear and anger in work life and personal life settings. Two qualitative studies form the basis for the STEM’s item development: (1) content analysis of responses to semi-structured interviews with 31 psychology undergraduates and 19 community volunteers; and (2) content analysis of free responses to targeted vignettes created from these semi-structured interviews (N = 99). The STEM may be scored according to two expert panels of emotions researchers, psychologists, therapists and life coaches (N = 12 and N = 6). In the first multi-variate study (N = 207 psychology undergraduates), both STEU and STEM scores relate strongly to vocabulary test scores and moderately to Agreeableness but no other dimension from the five-factor model of personality. STEU scores predict psychology grade and an emotionally-oriented thinking style after controlling vocabulary and personality test scores (ΔR2 = .08 and .06 respectively). STEM scores did not predict academic achievement but did predict emotionally-oriented thinking and life satisfaction (ΔR2 = .07 and .05 for emotionally-oriented thinking and .04 for life satisfaction). In the second multi-variate study, STEU scores predict lower levels of state anxiety, and STEM scores predict lower levels of state anxiety, depression, and stress among 149 community volunteers from Sydney, Australia. In the third multi-variate study (N = 181 psychology undergraduates), Strategic EI, fluid intelligence (Gf) and crystallized intelligence (Gc) were each measured with three indicators, allowing these constructs to be assessed at the latent variable level. Nested structural equation models show that Strategic EI and Gc form separate latent factors (Δχ2(1) = 12.44, p < .001). However, these factors relate very strongly (r = .73), indicating that Strategic EI may be a primary mental ability underlying Gc. In this study, STEM scores relate to emotionally-oriented thinking but not loneliness, life satisfaction or state stress, and STEU scores do not relate to any of these. STEM scores are significantly and meaningfully higher for females (d = .80), irrespective of gender differences in verbal ability or personality, or whether expert scores are derived from male or female experts. The fourth multi-variate study (N = 118 psychology undergraduates) distinguishes an EI latent factor (indicated by scores on the STEU, STEM and two emotion recognition ability measures) from a general cognitive ability factor (indicated by three intelligence measures; Δχ2(1) = 10.49, p < .001), although again cognitive ability and EI factors were strongly related (r = .66). Again, STEM scores were significantly higher for females (d = .44) and both STEU and STEM relate to Agreeableness but not to any other dimension from the five-factor model of personality. Taken together, results suggest that: (1) STEU and STEM scores are reasonably reliable and valid tests of EI; (2) EI tests assess slightly different constructs to existing measures of Gc, but more likely form a new primary mental ability within Gc than an entirely separate construct; and (3) the female superiority for EI tests may prove useful for addressing adverse impact in applied settings (e.g., selection for employment, promotion or educational opportunities), particularly given that many current assessment tools result in a male advantage.
|
6 |
A Comparison of Adjacent Categories and Cumulative DSF Effect EstimatorsGattamorta, Karina Alvarez 18 December 2009 (has links)
The study of measurement invariance in polytomous items that targets individual score levels is known as differential step functioning (DSF; Penfield, 2007, 2008). DSF methods provide specific information describing the manifestation of the invariance effect within particular score levels and therefore serve a diagnostic role in identifying the individual score levels involved in the item's invariance effect. The analysis of DSF requires the creation of a set of dichotomizations of the item response variable. There are two primary approaches for creating the set of dichotomizations to conduct a DSF analysis. The first approach, known as the adjacent categories approach, is consistent with the dichotomization scheme underlying the generalized partial credit model (GPCM; Muraki, 1992) and considers each pair of adjacent score levels while treating the other score levels as missing. The second approach, known as the cumulative approach, is consistent with the dichotomization scheme underlying the graded response model (GRM; Samejima, 1997) and includes data from every score level in each dichotomization. To date, there is limited research on how the cumulative and adjacent categories approaches compare within the context of DSF, particularly as applied to a real data set. The understanding of how the interpretation and practical outcomes may vary given these two approaches is also limited. The current study addressed these two issues. This study evaluated the results of a DSF analysis using both the adjacent categories and cumulative dichotomization schemes in order to determine if the two approaches yield similar results and interpretations of DSF. These approaches were applied to data from a polytomously scored alternate assessment administered to children with significant cognitive disabilities. The results of the DSF analyses revealed that the two approaches generally led to consistent results, particularly in the case where DSF effects were negligible. For steps where significant DSF was present, the two approaches generally guide analysts to the same location of the item. However, several aspects of the results rose questions about the use of the adjacent categories dichotomization scheme. First, there seemed to be a lack of independence of the adjacent categories method since large DSF effects at one step are often paired with large DSF effects in the opposite direction found in the previous step. Additionally, when a substantial DSF effect existed, it was more likely to be significant using the cumulative approach over the adjacent categories approach. This is likely due to the smaller standard errors that lead to greater stability of the cumulative approach. In sum, the results indicate that the cumulative approach is preferable over the adjacent categories approach when conducting a DSF analysis.
|
7 |
Developing and validating self-report instruments : assessing perceived driver competenceSundström, Anna January 2009 (has links)
The overall aim of this thesis was to develop and validate a self-report instrument for perceived driver competence. The thesis includes six papers and a summary. All papers focus on perceived driver competence from a measurement perspective; that is, how to develop an instrument for perceived driver competence and how to use and interpret the scores from the instrument in a reliable and valid manner. Study I reviews how perceived driver competence has been measured in other studies and discusses these methods from a measurement perspective. Most studies have examined perceived driver competence by asking drivers to compare their own skill to that of the average driver. That method is problematic, since it is not possible to determine if drivers are overconfident or not, when empirical information of their own skills is missing. In order to examine if drivers overestimate their skills or not, perceived driver competence should be compared with actual driving performance. Study II reports on the development and psychometric evaluation of a self-report instrument for perceived driver competence - the Self-Efficacy Scale for Driver Competence (SSDC). The findings provides support for construct validity, as the SSDC demonstrated sound psychometric properties and as the internal structure of the SSDC corresponded to the theoretical model used as a basis for instrument development. In study III, the psychometric properties of the SSDC were further examined using an item response theory (IRT) model. The findings confirmed the results indicated by the classical analyses in Study II. Additional information was provided by the IRT analyses, as it was indicated that the scale would benefit from fewer scale points or by putting labels on each scale point. In study IV, Swedish and Finnish candidates’ self-assessment accuracy was examined by comparing candidates’ scores on the SSDC and a similar instrument for self-assessment of driving skill used in Finland, with driving test performance. Unlike previous studies, in which drivers compared their perceived skills to that of the average driver, a relatively large proportion made a realistic assessment of their own skills. In addition, in contrast to previous studies, no gender differences were found. These results were also confirmed in study V, where the results from the Finnish instrument for self-assessment of driving skill were compared with the results from a similar instrument used in the Netherlands. Study VI further examined the construct validity of a revised version of the SSDC, combining qualitative and quantitative sources of evidence. There was a strong relationship between the SSDC and an instrument for self-assessment of driving skills, providing support for convergent validity. No relationship was found between the SSDC and driving test performance. Explanations of the lack of relationship were provided from semi-structured interviews, as they indicated that confidence in performing different tasks in the test are different from being confident of passing the test, and that the candidates are familiar neither with assessing their own skills nor with the requirements for passing the test. In conclusion, the results from this thesis indicated that the choice of methods for assessing perceived driver competence as well as the quality of these methods affect the validity. The results provided support for different aspects of construct validity of the SSDC. Moreover, the findings illustrated the benefits of combining different methods in test validation, as each method contributed information about the validity of the SSDC. The studies in this thesis mainly examined internal and external aspects of construct validity. Future studies should examine procedural validity of the SSDC.
|
8 |
New approaches to measuring emotional intelligenceMacCann, Carolyn Elizabeth January 2006 (has links)
Doctor of Philosophy (PhD) / New scoring and test construction methods for emotional intelligence (EI) are suggested as alternatives for current practice, where most tests are scored by group judgment and are in ratings-based format. Both the ratings-based format and the proportion-based scores resulting from group judgments may act as method effects, obscuring relationships between EI tests, and between EI and intelligence. In addition, scoring based on standards rather than group judgments add clarity to the meaning of test scores. For these reasons, two new measures of emotional intelligence (EI) are constructed: (1) the Situational Test of Emotional Understanding (STEU); and (2) the Situational Test of Emotion Management (STEM). Following test construction, validity evidence is collected from four multi-variate studies. The STEU’s items and a standards-based scoring system are developed according to empirically derived appraisal theory concerning the structure of emotion [Roseman, 2001]. The STEM is developed as a Situational Judgment Test (SJT) with situations representing sadness, fear and anger in work life and personal life settings. Two qualitative studies form the basis for the STEM’s item development: (1) content analysis of responses to semi-structured interviews with 31 psychology undergraduates and 19 community volunteers; and (2) content analysis of free responses to targeted vignettes created from these semi-structured interviews (N = 99). The STEM may be scored according to two expert panels of emotions researchers, psychologists, therapists and life coaches (N = 12 and N = 6). In the first multi-variate study (N = 207 psychology undergraduates), both STEU and STEM scores relate strongly to vocabulary test scores and moderately to Agreeableness but no other dimension from the five-factor model of personality. STEU scores predict psychology grade and an emotionally-oriented thinking style after controlling vocabulary and personality test scores (ΔR2 = .08 and .06 respectively). STEM scores did not predict academic achievement but did predict emotionally-oriented thinking and life satisfaction (ΔR2 = .07 and .05 for emotionally-oriented thinking and .04 for life satisfaction). In the second multi-variate study, STEU scores predict lower levels of state anxiety, and STEM scores predict lower levels of state anxiety, depression, and stress among 149 community volunteers from Sydney, Australia. In the third multi-variate study (N = 181 psychology undergraduates), Strategic EI, fluid intelligence (Gf) and crystallized intelligence (Gc) were each measured with three indicators, allowing these constructs to be assessed at the latent variable level. Nested structural equation models show that Strategic EI and Gc form separate latent factors (Δχ2(1) = 12.44, p < .001). However, these factors relate very strongly (r = .73), indicating that Strategic EI may be a primary mental ability underlying Gc. In this study, STEM scores relate to emotionally-oriented thinking but not loneliness, life satisfaction or state stress, and STEU scores do not relate to any of these. STEM scores are significantly and meaningfully higher for females (d = .80), irrespective of gender differences in verbal ability or personality, or whether expert scores are derived from male or female experts. The fourth multi-variate study (N = 118 psychology undergraduates) distinguishes an EI latent factor (indicated by scores on the STEU, STEM and two emotion recognition ability measures) from a general cognitive ability factor (indicated by three intelligence measures; Δχ2(1) = 10.49, p < .001), although again cognitive ability and EI factors were strongly related (r = .66). Again, STEM scores were significantly higher for females (d = .44) and both STEU and STEM relate to Agreeableness but not to any other dimension from the five-factor model of personality. Taken together, results suggest that: (1) STEU and STEM scores are reasonably reliable and valid tests of EI; (2) EI tests assess slightly different constructs to existing measures of Gc, but more likely form a new primary mental ability within Gc than an entirely separate construct; and (3) the female superiority for EI tests may prove useful for addressing adverse impact in applied settings (e.g., selection for employment, promotion or educational opportunities), particularly given that many current assessment tools result in a male advantage.
|
9 |
Tradução e adaptação cultural do instrumento Global Appraisal of Individual Needs - INITIAL / Translation and cultural adaptation of the instrument Global Appraisal of Individual Needs - INITIALHeloísa Garcia Claro 17 December 2010 (has links)
Este estudo objetivou realizar a tradução e adaptação cultural do instrumento Global Appraisal of Individual Needs - Initial, e calcular seu Índice de Validade de Conteúdo. Foi realizado um estudo do tipo metodológico, que obedeceu aos procedimentos internacionais recomendados pela literatura. O instrumento foi traduzido para o português em duas versões, que foram analisadas e deram origem à síntese das traduções, sendo esta submetida à avaliação de um Comitê de quatro juízes doutores especialistas na área de álcool e outras drogas. Após a incorporação das sugestões desses juízes ao instrumento, este foi retrotraduzido, sua versão em português e sua retrotradução para o inglês foram ressubmetidas à avaliação dos juízes e também dos desenvolvedores do instrumento original, sofrendo novamente alterações que resultaram na versão final do instrumento, o Avaliação Global das Necessidades Individuais - Inicial. O Índice de Validade de Conteúdo do instrumento foi de 0,91, considerado válido pela literatura. Concluiu-se que o instrumento Avaliação Global das Necessidades Individuais - Inicial é um instrumento adaptado culturalmente para o português falado no Brasil, entretanto o instrumento não foi submetido a testes com a população-alvo, o que sugere que sejam realizados estudos futuros que testem sua confiabilidade e validade. / This study aimed to translate and culturally adapt the instrument Global Appraisal of Individual Needs - Initial, and calculate its Content Validity Index. We conducted a study of methodological type, following the procedures recommended by the international literature. The instrument was translated into Portuguese in two versions, which were analyzed and originated the synthesis of the translations, which was evaluated by a committee of four judges with PHD degree, specialists in the field of alcohol and other drugs. After incorporating the suggestions of those judges to the instrument, it was backtranslated, and its Portuguese version and its back-translated into English were re-submitted to the same judges and also the developers of the original instrument, undergoing changes again, resulting in the final version of the instrument the Global Assessment of Individual Needs - Home. The Content Validity Index of the instrument was 0.91, considered valid by the literature. We concluded that the Global Assessment of Individual Needs - Initial is a culturally adapted instrument for the Portuguese spoken in Brazil, however, the instrument has not been tested with the target population yet, which suggests future researches to test its reliability and validity.
|
10 |
Designing Software to Unify Person-Fit AssessmentPfleger, Phillip Isaac 10 December 2020 (has links)
Item-response theory (IRT)assumes that the model fits the data. One commonly overlooked aspect of model-fit assessment is an examination of personfit, or person-fit assessment (PFA). One reason that PFA lacks popularity among psychometricians is that comprehensive software is notpresent.This dissertation outlines the development and testing ofa new software package, called wizirt, that will begin to meet this need. This software package provides a wide gamut of tools to the user but is currently limited to unidimensional, dichotomous, and parametricmodels. The wizirt package is built in the open source language R, where it combines the capabilities of a number of other R packages under a single syntax.In addition to the wizirt package, I have created a number of resources to help users learn to use the package. This includes support for individuals who have never used R before, as well as more experienced R users.
|
Page generated in 0.0727 seconds