Global ETD Search

41	The Development of Career Self-Efficacy Questionnaire Chang, Hsuan-Chih 31 July 2012 (has links) The purpose of this study was to develop a questionnaire to measure career self-efficacy for undergraduates. The theoretical framework of the questionnaire was based on Bandura¡¦s self-efficacy theory. A total of 409 participants were selected by judgment sampling from the first- and second-grade undergraduate of six colleges in national Sun Yat-sen university. The newly developed career efficacy and motivation questionnaire (CEMQ) was modified from Taylor & Betz's CDMSE scale. The content validity was informed by three experts, and the data were analyzed by rating scale model (RSM) by ConQuest. After removing the items that did not fit the model, seventy-two items were retained in the CEMQ questionnaire. Self-Efficacy IRT Career Self-Efficacy
42	Making Diagnostic Thresholds Less Arbitrary Unger, Alexis Ariana 2011 May 1900 (has links) The application of diagnostic thresholds plays an important role in the classification of mental disorders. Despite their importance, many diagnostic thresholds are set arbitrarily, without much empirical support. This paper seeks to introduce and analyze a new empirically based way of setting diagnostic thresholds for a category of mental disorders that has historically had arbitrary thresholds, the personality disorders (PDs). I analyzed data from over 2,000 participants that were part of the Methods to Improve Diagnostic Assessment and Services (MIDAS) database. Results revealed that functional outcome scores, as measured by Global Assessment of Functioning (GAF) scores, could be used to identify diagnostic thresholds and that the optimal thresholds varied somewhat by personality disorder (PD) along the spectrum of latent severity. Using the Item response theory (IRT)-based approach, the optimal threshold along the spectrum of latent severity for the different PDs ranged from θ = 1.50 to 2.25. Effect sizes using the IRT-based approach ranged from .34 to 1.55. These findings suggest that linking diagnostic thresholds to functional outcomes and thereby making them less arbitrary is an achievable goal. This study has introduced a new and uncomplicated way to empirically set diagnostic thresholds while also taking into consideration that items within diagnostic sets may function differently. Although purely an initial demonstration meant only to serve as an example, by using this approach, there exists the potential that diagnostic thresholds for all disorders could one day be set on an empirical basis. diagnostic threshold item response theory (IRT) personality disorder (PD) global assessment of functioning (GAF)
43	Detecting Aberrant Responding on Unidimensional Pairwise Preference Tests: An Application of based on the Zinnes Griggs Ideal Point IRT Model Lee, Philseok 01 January 2013 (has links) This study investigated the efficacy of the lz person fit statistic for detecting aberrant responding with unidimensional pairwise preference (UPP) measures, constructed and scored based on the Zinnes-Griggs (ZG, 1974) IRT model, which has been used for a variety of recent noncognitive testing applications. Because UPP measures are used to collect both "self-" and "other-" reports, I explored the capability of lz to detect two of the most common and potentially detrimental response sets, namely fake good and random responding. The effectiveness of lz was studied using empirical and theoretical critical values for classification, along with test length, test information, the type of statement parameters, and the percentage of items answered aberrantly (20%, 50%, 100%). We found that lz was ineffective in detecting fake good responding, with power approaching zero in the 100% aberrance conditions. However, lz was highly effective in detecting random responding, with power approaching 1.0 in long-test, high information conditions, and there was no diminution in efficacy when using marginal maximum likelihood estimates of statement parameters in place of the true values. Although using empirical critical values for classification provided slightly higher power and more accurate Type I error rates, theoretical critical values, corresponding to a standard normal distribution, provided nearly as good results. Appropriateness Measurement Ideal Point Item Response Theory (IRT) Noncognitive Assessment Pairwise Preference Person Fit Psychology
44	An investigation of the optimal test design for multi-stage test using the generalized partial credit model Chen, Ling-Yin 27 January 2011 (has links) Although the design of Multistage testing (MST) has received increasing attention, previous studies mostly focused on comparison of the psychometric properties of MST with CAT and paper-and-pencil (P&P) test. Few studies have systematically examined the number of items in the routing test, the number of subtests in a stage, or the number of stages in a test design to achieve accurate measurement in MST. Given that none of the studies have identified an ideal MST test design using polytomously-scored items, the current study conducted a simulation to investigate the optimal design for MST using generalized partial credit model (GPCM). Eight different test designs were examined on ability estimation across two routing test lengths (short and long) and two total test lengths (short and long). The item pool and generated item responses were based on items calibrated from a national test consisting of 273 partial credit items. Across all test designs, the maximum information routing method was employed and the maximum likelihood estimation was used for ability estimation. Ten samples of 1,000 simulees were used to assess each test design. The performance of each test design was evaluated in terms of the precision of ability estimates, item exposure rate, item pool utilization, and item overlap. The study found that all test designs produced very similar results. Although there were some variations among the eight test structures in the ability estimates, results indicate that the performance overall of these eight test structures in achieving measurement precision did not substantially deviate from one another with regard to total test length and routing test length. However, results from the present study suggest that routing test length does have a significant effect on the number of non-convergent cases in MST tests. Short routing tests tended to result in more non-convergent cases, and the presence of fewer stage tests yielded more of such cases than structures with more stages. Overall, unlike previous findings, the results of the present study indicate that the MST test structure is less likely to be a factor impacting ability estimation when polytomously-scored items are used, based on GPCM. / text Multistage testing Generalized partial credit model Polytomous IRT Test structures Routing test length Educational tests and measurements
45	An evaluation of item difficulty and person ability estimation using the multilevel measurement model with short tests and small sample sizes Brune, Kelly Diane 08 June 2011 (has links) Recently, researchers have reformulated Item Response Theory (IRT) models into multilevel models to evaluate clustered data appropriately. Using a multilevel model to obtain item difficulty and person ability parameter estimates that correspond directly with IRT models’ parameters is often referred to as multilevel measurement modeling. Unlike conventional IRT models, multilevel measurement models (MMM) can handle, the addition of predictor variables, appropriate modeling of clustered data, and can be estimated using non-specialized computer software, including SAS. For example, a three-level model can model the repeated measures (level one) of individuals (level two) who are clustered within schools (level three). Limitations in terms of the minimum sample size and number of test items that permit reasonable one-parameter logistic (1-PL) IRT model’s parameters have not been examined for either the two- or three-level MMM. Researchers (Wright and Stone, 1979; Lord, 1983; Hambleton and Cook, 1983) have found that sample sizes under 200 and fewer than 20 items per test result in poor model fit and poor parameter recovery for dichotomous 1-PL IRT models with data that meet model assumptions. This simulation study tested the performance of the two-level and three-level MMM under various conditions that included three sample sizes (100, 200, and 400), three test lengths (5, 10, and 20), three level-3 cluster sizes (10, 20, and 50), and two generated intraclass correlations (.05 and .15). The study demonstrated that use of the two- and three-level MMMs lead to somewhat divergent results for item difficulty and person-level ability estimates. The mean relative item difficulty bias was lower for the three-level model than the two-level model. The opposite was true for the person-level ability estimates, with a smaller mean relative parameter bias for the two-level model than the three-level model. There was no difference between the two- and three-level MMMs in the school-level ability estimates. Modeling clustered data appropriately; having a minimum total sample size of 100 to accurately estimate level-2 residuals and a minimum total sample size of 400 to accurately estimate level-3 residuals; and having at least 20 items will help ensure valid statistical test results. / text Multilevel measurement model MMM Item response theory IRT Hierarchical generalized linear modeling Testing
46	Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parameters Orr, Aline Pinto 25 July 2011 (has links) Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates. / text Item response theory Testlet response theory Testlet Markov Chain Monte Carlo IRT TRT MCMC
47	Aspects of Modeling Fraud Prevention of Online Financial Services Dan, Gorton January 2015 (has links) Banking and online financial services are part of our critical infrastructure. As such, they comprise an Achilles heel in society and need to be protected accordingly. The last ten years have seen a steady shift from traditional show-off hacking towards cybercrime with great economic consequences for society. The different threats against online services are getting worse, and risk management with respect to denial-of-service attacks, phishing, and banking Trojans is now part of the agenda of most financial institutions. This trend is overseen by responsible authorities who step up their minimum requirements for risk management of financial services and, among other things, require regular risk assessment of current and emerging threats.For the financial institution, this situation creates a need to understand all parts of the incident response process of the online services, including the technology, sub-processes, and the resources working with online fraud prevention. The effectiveness of each countermeasure has traditionally been measured for one technology at a time, for example, leaving the fraud prevention manager with separate values for the effectiveness of authentication, intrusion detection, and fraud prevention. In this thesis, we address two problems with this situation. Firstly, there is a need for a tool which is able to model current countermeasures in light of emerging threats. Secondly, the development process of fraud detection is hampered by the lack of accessible data.In the main part of this thesis, we highlight the importance of looking at the “big risk picture” of the incident response process, and not just focusing on one technology at a time. In the first article, we present a tool which makes it possible to measure the effectiveness of the incident response process. We call this an incident response tree (IRT). In the second article, we present additional scenarios relevant for risk management of online financial services using IRTs. Furthermore, we introduce a complementary model which is inspired by existing models used for measuring credit risks. This enables us to compare different online services, using two measures, which we call Expected Fraud and Conditional Fraud Value at Risk. Finally, in the third article, we create a simulation tool which enables us to use scenario-specific results together with models like return of security investment, to support decisions about future security investments.In the second part of the thesis, we develop a method for producing realistic-looking data for testing fraud detection. In the fourth article, we introduce multi-agent based simulations together with social network analysis to create data which can be used to fine-tune fraud prevention, and in the fifth article, we continue this effort by adding a platform for testing fraud detection. / Finansiella nättjänster är en del av vår kritiska infrastruktur. På så vis utgör de en akilleshäl i samhället och måste skyddas på erforderligt sätt. Under de senaste tio åren har det skett en förskjutning från traditionella dataintrång för att visa upp att man kan till en it-brottslighet med stora ekonomiska konsekvenser för samhället. De olika hoten mot nättjänster har blivit värre och riskhantering med avseende på överbelastningsattacker, nätfiske och banktrojaner är nu en del av dagordningen för finansiella institutioner. Denna trend övervakas av ansvariga myndigheter som efterhand ökar sina minimikrav för riskhantering och bland annat kräver regelbunden riskbedömning av befintliga och nya hot.För den finansiella institutionen skapar denna situation ett behov av att förstå alla delar av incidenthanteringsprocessen, inklusive dess teknik, delprocesser och de resurser som kan arbeta med bedrägeribekämpning. Traditionellt har varje motåtgärds effektivitet mätts, om möjligt, för en teknik i taget, vilket leder till att ansvariga för bedrägeribekämpning får separata värden för autentisering, intrångsdetektering och bedrägeridetektering.I denna avhandling har vi fokuserat på två problem med denna situation. För det första finns det ett behov av ett verktyg som kan modellera effektiviteten för institutionens samlade motåtgärder mot bakgrund av befintliga och nya hot. För det andra saknas det tillgång till data för forskning rörande bedrägeridetektering, vilket hämmar utvecklingen inom området.I huvuddelen av avhandlingen ligger tonvikten på att studera ”hela” incidenthanteringsprocessen istället för att fokusera på en teknik i taget. I den första artikeln presenterar vi ett verktyg som gör det möjligt att mäta effektiviteten i incidenthanteringsprocessen. Vi kallar detta verktyg för ”incident response tree” (IRT) eller ”incidenthanteringsträd”. I den andra artikeln presenterar vi ett flertal scenarier som är relevanta för riskhantering av finansiella nättjänster med hjälp av IRT. Vi utvecklar också en kompletterande modell som är inspirerad av befintliga modeller för att mäta kreditrisk. Med hjälp av scenarioberoende mått för ”förväntat bedrägeri” och ”value at risk”, har vi möjlighet att jämföra risker mellan olika nättjänster. Slutligen, i den tredje artikeln, skapar vi ett agentbaserat simuleringsverktyg som gör det möjligt att använda scenariospecifika resultat tillsammans med modeller som ”avkastning på säkerhetsinvesteringar” för att stödja beslut om framtida investeringar i motåtgärder.I den andra delen av avhandlingen utvecklar vi en metod för att generera syntetiskt data för test av bedrägeridetektering. I den fjärde artikeln presenterar vi ett agentbaserat simuleringsverktyg som med hjälp av bland annat ”sociala nätverksanalyser” kan användas för att generera syntetiskt data med realistiskt utseende. I den femte artikeln fortsätter vi detta arbete genom att lägga till en plattform för testning av bedrägeridetektering. / <p>QC 20151103</p> Online banking fraud incident response metrics incident response tree (IRT) value at risk (VaR) simulation
48	Improvements for Differential Functioning of Items and Tests (DFIT): Investigating the Addition of Reporting an Effect Size Measure and Power Wright, Keith D 07 May 2011 (has links) Standardized testing has been part of the American educational system for decades. Controversy from the beginning has plagued standardized testing, is plaguing testing today, and will continue to be controversial. Given the current federal educational policies supporting increased standardized testing, psychometricians, educators and policy makers must seek ways to ensure that tests are not biased towards one group over another. In measurement theory, if a test item behaves differently for two different groups of examinees, this test item is considered a differential functioning test item (DIF). Differential item functioning, often conceptualized in the context of item response theory (IRT) is a term used to describe test items that may favor one group over another after matched on ability. It is important to determine whether an item is functioning significantly different for one group over another regardless as to why. Hypothesis testing is used to determine statistical significant DIF items; an effect size measure quantifies a statistical significant difference. This study investigated the addition of reporting an effect size measure for differential item functioning of items and tests’ (DFIT) noncompensatory differential item functioning (NCDIF), and reporting empirically observed power. The Mantel-Haenszel (MH) parameter served as the benchmark for developing NCDIF’s effect size measure, for reporting moderate and large differential item functioning in test items. In addition, by modifying NCDIF’s unique method for determining statistical significance, NCDIF will be the first DIF statistic of test items where in addition to reporting an effect size measure, empirical power can also be reported. Furthermore, this study added substantially to the body of literature on effect size by also investigating the behavior of two other DIF measures, Simultaneous Item Bias Test (SIBTEST) and area measure. Finally, this study makes a significant contribution to the body of literature by verifying in a large-scale simulation study, the accuracy of software developed by Roussos, Schnipke, and Pashley (1999) to calculate the true MH parameter. The accuracy of this software had not been previously verified. IRT Effect Size Power NCDIF DIF Standardize Testing Education Education Policy
49	Algorithms for assessing the quality and difficulty of multiple choice exam questions Luger, Sarah Kaitlin Kelly January 2016 (has links) Multiple Choice Questions (MCQs) have long been the backbone of standardized testing in academia and industry. Correspondingly, there is a constant need for the authors of MCQs to write and refine new questions for new versions of standardized tests as well as to support measuring performance in the emerging massive open online courses, (MOOCs). Research that explores what makes a question difficult, or what questions distinguish higher-performing students from lower-performing students can aid in the creation of the next generation of teaching and evaluation tools. In the automated MCQ answering component of this thesis, algorithms query for definitions of scientific terms, process the returned web results, and compare the returned definitions to the original definition in the MCQ. This automated method for answering questions is then augmented with a model, based on human performance data from crowdsourced question sets, for analysis of question difficulty as well as the discrimination power of the non-answer alternatives. The crowdsourced question sets come from PeerWise, an open source online college-level question authoring and answering environment. The goal of this research is to create an automated method to both answer and assesses the difficulty of multiple choice inverse definition questions in the domain of introductory biology. The results of this work suggest that human-authored question banks provide useful data for building gold standard human performance models. The methodology for building these performance models has value in other domains that test the difficulty of questions and the quality of the exam takers. 006.3
50	Assessment of Competencies Among Doctoral Trainees in Psychology Price, Samantha 08 1900 (has links) The recent shift to a culture of competence has permeated several areas of professional psychology, including competency identification, competency-based education training, and competency assessment. A competency framework has also been applied to various programs and specialty areas within psychology, such as clinical, counseling, clinical health, school, cultural diversity, neuro-, gero-, child, and pediatric psychology. Despite the spread of competency focus throughout psychology, few standardized measures of competency assessment have been developed. To the authors' knowledge, only four published studies on measures of competency assessment in psychology currently exist. While these measures demonstrate significant steps in progressing the assessment of confidence, three of these measures were designed for use with individual programs, two of these international (i.e., UK and Taiwan). The current study applied the seminal Competency Benchmarks, via a recently adapted benchmarks form (i.e., Practicum Evaluation form; PEF), to practicum students at the University of North Texas. In addition to traditional supervisor ratings, the present study also involved self-, peer supervisor, and peer supervisee ratings to provide 360-degree evaluations. Item-response theory (IRT) was used to evaluate the psychometric properties of the PEF and inform potential revisions of this form. Supervisor ratings of competency were found to fit the Rasch model specified, lending support to use of the benchmarks framework as assessed by this form. Self- and peer-ratings were significantly correlated with supervisor ratings, indicating that there may be some utility to 360-degree evaluations. Finally, as predicted, foundational competencies were rated as significantly higher than functional competencies, and competencies improved significantly with training. Results of the current study provide clarity about the utility of the PEF and inform our understanding of practicum-level competencies. competency psychology assessment psychometric evaluation item-response theory (IRT) Psychology, Clinical

Search results