Global ETD Search

1	An Empirical Examination of the Impact of Item Parameters on IRT Information Functions in Mixed Format Tests Lam, Wai Yan Wendy 01 February 2012 (has links) IRT, also referred as "modern test theory", offers many advantages over CTT-based methods in test development. Specifically, an IRT information function has the capability to build a test that has the desired precision of measurement for any defined proficiency scale when a sufficient number of test items are available. This feature is extremely useful when the information is used for decision making, for instance, whether an examinee attain certain mastery level. Computerized adaptive testing (CAT) is one of the many examples using IRT information functions in test construction. The purposes of this study were as follows: (1) to examine the consequences of improving the test quality through the addition of more discriminating items with different item formats; (2) to examine the effect of having a test where its difficulty does not align with the ability level of the intended population; (3) to investigate the change in decision consistency and decision accuracy; and (4) to understand changes in expected information when test quality is either improved or degraded, using both empirical and simulated data. Main findings from the study were as follows: (1) increasing the discriminating power of any types of items generally increased the level of information; however, sometimes it could bring adverse effect to the extreme ends of the ability continuum; (2) it was important to have more items that were targeted at the population of interest, otherwise, no matter how good the quality of the items may be, they were of less value in test development when they were not targeted to the distribution of candidate ability or at the cutscores; (3) decision consistency (DC), Kappa statistic, and decision accuracy (DA) increased with better quality items; (4) DC and Kappa were negatively affected when difficulty of the test did not match with the ability of the intended population; however, the effect was less severe if the test was easier than needed; (5) tests with more better quality items lowered false positive (FP) and false negative (FN) rate at the cutscores; (6) when test difficulty did not match with the ability of the target examinees, in general, both FP and FN rates increased; (7) polytomous items tended to yield more information than dichotomously scored items, regardless of the discriminating parameter and difficulty of the item; and (8) the more score categories an item had, the more information it could provide. Findings from this thesis should help testing agencies and practitioners to have better understanding of the item parameters on item and test information functions. This understanding is crucial for the improvement of the item bank quality and ultimately on how to build better tests that could provide more accurate proficiency classifications. However, at the same time, item writers should be conscientious about the fact that the item information function is merely a statistical tool for building a good test, other criteria should also be considered, for example, content balancing and content validity. IRT Test Information Function Education
2	Potential test information for multidimensional tests Jonas, Katherine Grace 01 August 2017 (has links) Test selection in psychological assessment is guided, both explicitly and implicitly, by how informative tests are with regard to a trait of interest. Most existing formulations of test information are sensitive to subpopulation variation, with the result that test information will vary from sample to sample. Recently, measures of test information have been developed that quantify the potential informativeness of the test. These indices are defined by the properties of the test, as distinct from the properties of the sample or examinee. As of yet, however, measures of potential information have been developed only for unidimensional tests. In practice, psychological tests are often multidimensional. Furthermore, multidimensional tests are often used to estimate one specific trait among many. This study develops measures of potential test information for multidimensional tests, as well as measures of marginal potential test information---test information with regard to one trait within a multidimensional test. In Study 1, the performance of the metrics was tested in data simulated from unidimensional, first-order multidimensional, second-order, and bifactor models. In Study 2, measures of marginal and multidimensional potential test information are applied to a set of neuropsychological data collected as part of Rush University's Memory and Aging Project. In simulated data, marginal and multidimensional potential test information were sensitive to the changing dimensionality of the test. In observed neuropsychological data, five traits were identified. Verbal abilities were most closely correlated with probable dementia. Both indices of marginal potential test information identify the Mini Mental Status Exam as the best measure of that trait. More broadly, greater marginal potential test information calculated with regard to verbal abilities was associated with greater criterion validity. These measures allow for the direct comparison of two multidimensional tests that assess the same trait, facilitating test selection and improving the precision and validity of psychological assessment. item response theory measurement psychometrics test information Psychology
3	A Study on Developing a Spatial Ability Test for Myanmar Middle School Students ISHII, Hidetoki, YAMADA, Tsuyoshi, KHAING, Nu Nu 18 January 2012 (has links) No description available. test information function two parameter logistic model IRT spatial ability
4	A Comparative Analysis of Two Forms of Gyeonggi English Communicative Ability Test Based on Classical Test Theory and Item Response Theory Yoon, Young-Beol 16 March 2012 (has links) (PDF) This study is an empirical analysis of the 2009 and 2010 forms of the Gyeonggi English Communicative Ability Test (GECAT) based on the responses of 2,307 students to the 2009 GECAT and 2,907 students to the 2010 GECAT. The GECAT is an English proficiency examination sponsored by the Gyeonggi Provincial Office of Education (GOE) in South Korea. This multiple-choice test has been administered annually at the end of each school year to high school students since 2004 as a measure of the students' ability to communicate in English. From 2004 until 2009, the test included 80 multiple-choice items, but in 2010, the length of the test was decreased to include only 50 items. The purpose of this study was to compare the psychometric properties of the 80-item 2009 form of the test with the psychometric properties of the shorter 50-item test using both Classical Test Theory item analysis statistics and parameter estimates obtained from 3-PL Item Response Theory. Cronbach's alpha coefficient for both forms was estimated to be .92 indicating that the overall reliability of the scores obtained from the two different test forms was essentially equivalent. For most of the six linguistic subdomains, the average classical item difficulty indexes were very similar across the two forms. The average of the classical item discrimination indexes were also quite similar for the 2009 80-item test and the 50-item 2010 test. However, 13 of the 2009 items and 3 of the 2010 had point biserial correlations with either negative or lower than acceptable positive values. A distracter analysis was conducted for each of these items with less than acceptable discriminating power as a basis to revise them. Total information functions of 6 subdomain tests (speaking, listening, reading, writing, vocabulary and grammar) showed that most of the test information functions of the 2009 GECAT were peaked at the ability level of around 0.9 < θ < 1.5, while those of the 2010 GECAT were peaked at the ability level of around 0.0 θ < 0.6. Recommendations for improving the GECAT and conducting future research are included. CTT IRT test information functions distracter analysis English language instruction evaluation Educational Psychology
5	A Study Of The Predictive Validity Of The Baskent University English Proficiency Exam Through The Use Of The Two-parameter Irt Model&amp / #8217 / s Ability Estimates Yapar, Taner 01 January 2003 (has links) (PDF) The purpose of this study is to analyze the predictive power of the ability estimates obtained through the two-parameter IRT model on the English Proficiency Exam administered at BaSkent University in September 2001 (BUSPE 2001). As prerequisite analyses the fit of one- and two-parameter models of IRT were investigated. The data used for this study were the test data of all students (727) who took BUSPE 2001 and the departmental English course grades of the passing students. At the first stage, whether the assumptions of IRT were met was investigated. Next, the observed and theoretical distribution of the test data was reviewed by using chi square statistics. After that, the invariance of ability estimates across different sets of items and invariance of item parameters across different groups of students were examined. At the second stage, the predictive validity of BUSPE 2001 and its subtests was analyzed by using both classical test scores and ability estimates of the better fitting IRT model. The findings revealed that the test met the assumptions of unidimensionality, local independence and nonspeededness, the assumptions of equal discrimination indices was not met. Whether the assumption of minimal guessing was met remained vague. The chi square statistics indicated that only the two parameter model fitted the test data. The ability estimates were found to be invariant across different item sets and the item parameters were found to be invariant across different groups of students. The IRT estimated predictive validity outweighed the predictive validity calculated through classical total scores both for the whole test and its subtests. The reading subtest was the best predictor of future performance in departmental English courses among all subtests.
6	What if? : an enquiry into the semantics of natural language conditionals Hjálmarsson, Guðmundur Andri January 2010 (has links) This thesis is essentially a portfolio of four disjoint yet thematically related articles that deal with some semantic aspect or another of natural language conditionals. The thesis opens with a brief introductory chapter that offers a short yet opinionated historical overview and a theoretical background of several important semantic issues of conditionals. The second chapter then deals with the issue of truth values and conditions of indicative conditionals. So-called Gibbard Phenomenon cases have been used to argue that indicative conditionals construed in terms of the Ramsey Test cannot have truth values. Since that conclusion is somewhat incredible, several alternative options are explored. Finally, a contextualised revision of the Ramsey Test is offered which successfully avoids the threats of the Gibbard Phenomenon. The third chapter deals with the question of where to draw the so-called indicative/ subjunctive line. Natural language conditionals are commonly believed to be of two semantically distinct types: indicative and subjunctive. Although this distinction is central to many semantic analyses of natural conditionals, there seems to be no consensus on the details of its nature. While trying to uncover the grounds for the distinction, we will argue our way through several plausible proposals found in the literature. Upon discovering that none of these proposals seem entirely suited, we will reconsider our position and make several helpful observations into the nature of conditional sentences. And finally, in light of our observations, we shall propose and argue for plausible grounds for the indicative/subjunctive distinction.distinction. The fourth chapter offers semantics for modal and amodal natural language conditionals based on the distinction proposed in the previous chapter. First, the nature of modal and amodal suppositions will be explored. Armed with an analysis of modal and amodal suppositions, the corresponding conditionals will be examined further. Consequently, the syntax of conditionals in English will be uncovered for the purpose of providing input for our semantics. And finally, compositional semantics in generative grammar will be offered for modal and amodal conditionals. The fifth and final chapter defends Modus Ponens from alleged counterexamples. In particular, the chapter offers a solution to McGee’s infamous counterexamples. First, several solutions offered to the counterexamples hitherto are all argued to be inadequate. After a couple of observations on the counterexamples’ nature, a solution is offered and demonstrated. the solution suggests that the semantics of embedded natural language conditionals is more sophisticated than their surface syntax indicates. The heart of the solution is a translation function from the surface form of natural language conditionals to their logical form. Finally, the thesis ends with a conclusion that briefly summarises the main conclusions drawn in its preceding chapters. 410
7	Multiple Outlier Detection: Hypothesis Tests versus Model Selection by Information Criteria Lehmann, Rüdiger, Lösler, Michael January 2016 (has links) The detection of multiple outliers can be interpreted as a model selection problem. Models that can be selected are the null model, which indicates an outlier free set of observations, or a class of alternative models, which contain a set of additional bias parameters. A common way to select the right model is by using a statistical hypothesis test. In geodesy data snooping is most popular. Another approach arises from information theory. Here, the Akaike information criterion (AIC) is used to select an appropriate model for a given set of observations. The AIC is based on the Kullback-Leibler divergence, which describes the discrepancy between the model candidates. Both approaches are discussed and applied to test problems: the fitting of a straight line and a geodetic network. Some relationships between data snooping and information criteria are discussed. When compared, it turns out that the information criteria approach is more simple and elegant. Along with AIC there are many alternative information criteria for selecting different outliers, and it is not clear which one is optimal. info:eu-repo/classification/ddc/526 ddc:526
8	Robust Change Detection with Unknown Post-Change Distribution Sargun, Deniz January 2021 (has links) No description available. Electrical Engineering Computer Engineering Statistics
9	以最大測驗訊息量決定通過分數之研究 / Study of the Standard Setting by the Maximum Test Information 謝進昌, Shieh, Jin-Chang Unknown Date (has links) 本研究目的，乃在運用試題反應理論中最大測驗訊息量的概念於精熟標準設定上作為探討的主軸，透過其歷史的演進與發展，衍生出詮釋本研究最大測驗訊息量法的三個面向，分別為：元素的搭配組合與調整、廣義測驗建構流程、多元效度等，並以此概念賦予解釋運用最大測驗訊息量於精熟標準設定時的合理性與適切性。同時，確立最大測驗訊息量法於公式意涵、試題選擇與統計考驗力面向的合理性，建立其於精熟標準上的理論基礎，而後，再輔以精熟/未精熟者分類一致性信度值以期提供多元效度證據。最後，探討測驗分數的轉換方法、差異能力描述，期能同時獲得量與質的測驗結果解釋。綜整分析，可發現以下幾點結論：一、運用最大測驗訊息量法於精熟標準設定時，在分類的信度指標上，顯示由此求得精熟標準，經交叉驗證後，大致可獲得滿意的結果，皆有高達九成以上的精確分類水準，且藉由區間的概念亦能充分顯現出，以最大測驗訊息量法求得之標準，可作為專家設定精熟標準時參考、判斷的優勢。而在分數轉換上，不論搭配換算古典測驗分數法或測驗特徵曲線構圖法時，其分類精熟/未精熟者的一致性表現，大致可獲得滿意的結果，乃是值得參照的組合策略。二、在運用定錨點以解釋由最大測驗訊息量法於國中基本學力測驗求得之精熟標準時，可發現未精熟者乃僅需具備學科基礎知識與簡易圖示理解能力，而對於精熟者而言，則需進一步擁有對於廣泛學科知識的了解；複雜問題、資料與圖表詮釋；邏輯推理、分析實驗結果以獲得相關論點等能力，或者更高階之具備進階學科知識；綜合、評鑑資料、情境傳遞之訊息的能力。三、探討測驗長度因素時，分析結果顯示不論採行最大測驗訊息量法、換算古典測驗分數法或是測驗特徵曲線構圖法，皆受此因素的影響，顯示測驗長度愈長，分類一致性愈高，此項結果乃與過去大多數的研究一致。另，由本資料分析結果乃建議測驗長度20題時，會是必備的基本題數要求值。此外，若從細部精確錯誤分類人數角度分析時，於實務用途上，可發現對於影響轉換分數時，產生差異分數的因素，決策者並不容易掌握與控制，但卻可藉由增加測驗長度，分散分數點的人數，以彌平錯誤分類的影響。四、探討測驗異質性因素時，最大測驗訊息量法因具有因試題參數而調整估計受試者能力的特性，使得在異質測驗時，分類一致性仍能維持在不錯的水準之上。反觀換算古典測驗分數法與測驗特徵曲線構圖法，在固定精熟標準下，則有明顯的錯誤分類比率，此現象也反應出現行以固定60分作為及格(精熟)標準的缺失。五、探討採用簡易測驗、困難測驗或常態測驗間於轉換分數上之效果時，由換算古典測驗分數法或測驗特徵曲線構圖法轉換來自最大測驗訊息量法之精熟標準時，資料分析結果顯示，不論於何種測驗難度類型中，採用何種轉換方式，並不會嚴重影響轉換分數間一致性分類的效果。另，若從細部精確錯誤分類人數角度分析時，本研究所採之最大測驗訊息量法，因具備隨測驗難易程度來決定門檻的特性，於簡易測驗中求得之精熟標準較低，而於困難測驗中求得之精熟標準相對較高，使得於轉換分數上，即使有較大的差異分數，亦不會造成嚴重的錯誤分類人數。六、在探討測驗長度、測驗異質性因素與定錨點題目篩選間互動關係時，分析結果顯示，測驗長度與測驗異質性，並非是絕對影響定錨點題目篩選的因素，更重要的在於最大試題訊息量所對應之最適能力值是否能與定錨點相搭配。綜整之，本研究所採最大測驗訊息量法，經檢驗後，於分類一致性上乃具有不錯的表現，且搭配相對強韌、嚴謹的理論支持與適切測驗結果解釋方法等，是最適合用於大型考試上使用。因此，乃建議未來政府單位或實務工作者於進行大型證照、資格檢定考試時，可考慮使用本策略。 / The purpose of this study is to adopt the concepts of IRT maximum test information to standard setting. At first, we are trying to discover three facets of interpretation in using the maximum test information to standard setting through the historical movement of standard setting. The three facets are component combination and adjustment, generalized test construction processes and multiple validities. Depending on these three concepts, we can easily explain the reasonableness and appropriateness of maximum test information approach. After that, we further investigate the reasonableness from the dimensions of definition of formula, item selections and statistical power to establish the basic theory of the maximum information approach in standard setting. In addition, we also examine the effects on exact classification of master/non-master in expectation to provide multiple evidences for validity. Finally, the method of classical test scores transformation and difference ability description are discussed to provide quantitative and qualitative test result interpretation simultaneously. In sum, some conclusions are proposed. 1.In applying the maximum test information approach to standard setting, the effect on exact classification of master/ non-master may come to a satisfying result. We may have at least 90% exact classification performance. At the same time, we also find that the mastery standard deriving from the maximum test information approach may have some advantages being a starting reference point for experts to adjust on the basis of the view of confidence interval. In the aspect of classical test scores transformation, no matter what approach you take, the transformed classical test scores approach or test characteristic curve mapping method, the consistency of exact classification of master/ non-master may hold. We may suggest the combination strategy is really worthy to take into consideration in standard setting. 2.In applying the anchor point to interpret Basic Competency Test result, we may find non-master only has basic academic knowledge and simple graph understanding ability, but for the master, he may need extensive academic knowledge; ability of complicated problems、data and graph interpretation; logic reasoning、analyzing experimental result to get related issues. Moreover, advanced academic knowledge; ability of synthesizing and evaluating information from data and surroundings are also included. 3.In the aspect of test length, the result of this research shows no matter what approach you take, maximum test information approach、transformed classical test scores approach or test characteristic curve mapping method, they are all influenced. It shows the longer test length, the higher consistency of exact classification of master/non-master. This result is consistent to most of the studies in the past. On the other hand, we suggest the 20 items is a fundamental value. Moreover, from the view of exact number of error classification, we can find that the real factor affecting the difference scores in transforming classical test score is unable to control in practical usage, but we can just disperse the numbers of people in each test score point to reduce the influence of error classification by increasing test length. 4.In the aspect of diverse test difficulty, because the maximum test information approach possesses the characteristic of examinees’ ability adjustment depending on item parameters, it is less influenced to maintain a acceptable level of consistent classification. In contrast with the maximum test information approach, the transformed classical test scores approach and test characteristic curve mapping method may have obvious high ratio of error classification under the fixed mastery standard. This also reflects the deficiency of current fixed 60 points passing scores. 5.In the aspect of analyzing the effect of score transformation between easy、hard and normal test, this research shows no matter what approach you take in any type of test difficulty, they may not severely influenced. Furthermore, from the view of exact number of error classification, because the maximum test information approach possesses the characteristic of deciding passing level depending on the degree of test difficulty (the lower mastery standard in easy test and the higher in hard test), it may not lead to a severe error classification even if there exists a large difference score in classical test score transformation. 6.In the aspect of interaction between test length、diverse test difficulty and anchor items selection, this research shows that test length and diverse test difficulty are not the real factors affecting anchor items selection. The more accurate cause is if the mastery standard deriving from the maximum test information approach may coordinate with the anchor point or not. In sum, the maximum test information approach may not only lead to a satisfying exact classification performance after analysis, but also be supported by strong and strict theory and accompany proper test result interpretation method. It is the most proper method in standard setting for large-sized test. Finally, we suggest the government or practitioners may consider adopting this strategy for future usage. 最大測驗訊息量法換算古典測驗分數法測驗特徵曲線構圖法定錨點精熟標準設定精熟測驗 maximum test information approach anchor points standard setting mastery test

Search results