• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 359
  • 154
  • 76
  • 24
  • 18
  • 16
  • 16
  • 11
  • 9
  • 7
  • 6
  • 6
  • 5
  • 4
  • 4
  • Tagged with
  • 859
  • 434
  • 422
  • 136
  • 127
  • 124
  • 118
  • 117
  • 115
  • 109
  • 101
  • 86
  • 86
  • 86
  • 79
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Multilevel Mixture IRT Modeling for the Analysis of Differential Item Functioning

Dras, Luke 14 August 2023 (has links) (PDF)
A multilevel mixture IRT (MMixIRT) model for DIF analysis has been proposed as a solution to gain greater insight on the source of nuisance factors which reduce the reliability and validity of educational assessments. The purpose of this study was to investigate the efficacy of a MMix2PL model in detecting DIF across a broad set of conditions in hierarchically structured, dichotomous data. Monte Carlo simulation was performed to generate examinee response data with conditions common in the field of education. These include (a) two instrument lengths, (b) nine hierarchically structured sample sizes, (c) four latent class features, and (d) eight distinct DIF characteristics, thus allowing for an examination with 576 unique data conditions. DIF analysis was performed using an iterative IRT-based ordinal logistic regression technique, with the focal group identified through estimation of latent classes from a multilevel mixture model. For computational efficiency in analyzing 50 replications for each condition, model parameters were recovered using maximum likelihood estimation (MLE) with the expectation maximization algorithm. Performance of the MMix2PL model for DIF analysis was evaluated by (a) the accuracy in recovering the true class structure, (b) the accuracy of membership classification, and (c) the sensitivity in detecting DIF items and Type I error rates. Results from this study demonstrate that the model is predominantly influenced by instrument length and separation between the class mean abilities, referred to as impact. Enumeration accuracy improved by an average of 40% when analyzing the short 10-item instrument, but with 100 clusters enumeration accuracy was high regardless of the number of items. Classification accuracy was substantially influenced by the presence of impact. Under conditions with no impact, classification was unsuccessful as the matching between model-based class assignments and examinees' true classes averaged only 53.2%. At best, with impact of one standard deviation, classification accuracy averaged between 66.5% to 70.3%. Misclassification errors were then propagated forward to influence the performance of the DIF analysis. Detection power was poor, averaging only 0.34 across the analysis iterations that reached convergence. Additionally, the short 10-item instrument proved challenging for MLE, a condition in which a Bayesian estimation method appears necessary. Finally, this paper provides recommendations on data conditions which improve performance of the MMix2PL model for DIF analysis. Additionally, suggestions for several improvements to the MMix2PL analysis process, which have potential to improve the feasibility of the model for DIF analysis, are summarized.
262

Evaluating IRT- and CTT-based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Deng, Nina 01 September 2011 (has links)
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the " true" DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method. Four simulation studies were conducted. Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of IRT model data misfit and Study 4 checked the impact of using different scoring metrics. Finally, a real data study was conducted where no advantages were given to any models or assumptions. The results showed that the factors of LID and model misfit had a negative impact on " true" DA index, and made all selected methods over-estimate DA index. On the contrary, the DC estimates had minimal impacts from the above factors, although the LL method had poorer estimates in short tests and the LEE and HH methods were less robust to tests with a high level of LID. Comparing the selected methods, the LEE and HH methods had nearly identical results across all conditions, while the HH method had more flexibility in complex scoring metrics. The LL method was found sensitive to the choice of test reliability estimate. The LL method with Cronbach's alpha consistently underestimated DC estimates while LL with stratified alpha functioned noticeably better with smaller bias and more robustness in various conditions. Lastly it is hoped to make the software be available soon to permit the wider use of the HH method. The other methods in the study are already well supported by easy to use software
263

En jämförande studie mellan single-item-mätning med Borg centiMax skalan® och SPIN för social ångest

Kvaernå, Malin, Larsén, Jennifer January 2023 (has links)
Social ångest beskrivs som en obehagskänsla kopplat till sociala situationer där individen upplever en risk att bli negativt bedömd av andra (APA, 2013).Det finns ett stort behov av att effektivisera mätning av social ångest samt att på ett precist sätt mäta dess symtom. Vidare råder det svårigheter att mäta psykologiska konstrukt på ett tillförlitligt sätt. Borg centiMaxskalan® (CR100) är en skattningsskala som utvecklats för att mäta subjektiva upplevelser med större precision. Syftet med denna studieär att studera begreppsvaliditeten för single-item -mätning av social ångest med Borg centiMaxskalan®samt att se om single-item-mätning mäter social ångest på ett likvärdigt sätt som ett multi-item. För att undersöka detta jämförs single-item-mätning med centiMax för social ångest (SÅ-SI-cM) mot SPIN och socialångestskalan med centiMax (SÅS-MI-cM). Data samlades in digitalt via ett bekvämlighetsurval (N = 382). Resultatet visar ett positivt samband mellan SPIN och single-item-mätning med centiMax för social ångest(r = .77, p <.001) samt ett positivt samband mellan single-item-mätning med centiMax och multi-item-mätning med SÅS-cMax (r = .77, p <.001). Sammanfattningsvis tyder resultatet på att single-item-mätning med centiMaxmäter social ångest på ett likvärdigt sätt som SPIN och SÅS-MI-cM.Resultatet i denna studie indikerar på att vidare studier med single-item-mätningav social ångest med centiMax (SÅ-SI-cM) är av intresseför att vidare validera detta mätinstrument
264

Item Discrimination and Type I Error Rates in DIF Detection Using the Mantel-Haenszel and Logistic Regression Procedures

Li, Yanju 11 September 2012 (has links)
No description available.
265

Kontexteffekte in Large-Scale Assessments

Weirich, Sebastian 13 August 2015 (has links)
Im Rahmen der Item-Response-Theorie evaluiert die kumulative Dissertationsschrift verschiedene Methoden und Modelle zur Identifikation von Kontexteffekten in Large-Scale Assessments. Solche Effekte können etwa in quantitativen empirischen Schulleistungsstudien auftreten und zu verzerrten Item- und Personenparametern führen. Um in Einzelfällen abschätzen zu können, ob Kontexteffekte auftreten und dadurch die Gefahr verzerrter Parameter gegeben ist (und falls ja, in welcher Weise), müssen IRT-Modelle entwickelt werden, die zusätzlich zu Item- und Personeneffekten Kontexteffekte parametrisieren. Solch eine Parametrisierung ist im Rahmen Generalisierter Allgemeiner Linearer Modelle möglich. In der Dissertation werden Positionseffekte als ein Beispiel für Kontexteffekte untersucht, und es werden die statistischen Eigenschaften dieses Messmodells im Rahmen einer Simulationsstudie evaluiert. Hier zeigt sich vor allem die Bedeutung des Testdesigns: Um unverfälschte Parameter zu gewinnen, ist nicht nur ein adäquates Messmodell, sondern ebenso ein adäquates, also ausbalanciertes Testdesign notwendig. Der dritte Beitrag der Dissertation befasst sich mit dem Problem fehlender Werte auf Hintergrundvariablen in Large-Scale Assessments. Als Kontexteffekt wird in diesem Beispiel derjenige Effekt verstanden, der die Wahrscheinlichkeit eines fehlenden Wertes auf einer bestimmten Variablen systematisch beeinflusst. Dabei wurde das Prinzip der multiplen Imputation auf das Problem fehlender Werte auf Hintergrundvariablen übertragen. Anders als bisher praktizierte Ansätze (Dummy-Codierung fehlender Werte) konnten so in einer Simulationsstudie für fast alle Simulationsbedingungen unverfälschte Parameter auf der Personenseite gefunden werden. / The present doctoral thesis evaluates various methods and models of the item response theory to parametrize context effects in large-scale assessments. Such effects may occur in quantitative educational assessments and may cause biased item and person parameter estimates. To decide whether context effects occur in individual cases and lead to biased parameters, specific IRT models have to be developed which parametrize context effects additionally to item and person effects. The present doctoral thesis consists of three single contributions. In the first contribution, a model for the estimation of context effects in an IRT framework is introduced. Item position effects are examined as an example of context effects in the framework of generalized linear mixed models. Using simulation studies, the statistical properties of the model are investigated, which emphasizes the relevance of an appropriate test design. A balanced incomplete test design is necessary not only to obtain valid item parameters in the Rasch model, but to guarantee for unbiased estimation of position effects in more complex IRT models. The third contribution deals with the problem of missing background data in large-scale assessments. The effect which predicts the probability of a missing value on a certain variable, is considered as a context effect. Statistical methods of multiple imputation were brought up to the problem of missing background data in large-scale assessments. In contrast to other approaches used so far in practice (dummy coding of missing values) unbiased population and subpopulation estimates were received in a simulation study for most conditions.
266

Untersuchung zur prädiktiven Validität von Konzentrationstests

Schumann, Frank 12 September 2016 (has links) (PDF)
In der hier vorliegenden Arbeit wurde die Validität von Aufmerksamkeits- und Konzentrationstests untersucht. Im Vordergrund stand dabei die Frage nach dem Einfluss verschiedener kritischer Variablen auf die prädiktive Validität in diesen Tests, insbesondere der Itemschwierigkeit und Itemhomogenität, der Testlänge bzw. des Testverlaufs, der Testdiversifikation und der Validität im Kontext einer echten Personalauslese. In insgesamt fünf Studien wurden die genannten Variablen systematisch variiert und auf ihre prädiktive Validität zur (retrograden und konkurrenten) Vorhersage von schulischen und akademischen Leistungen (Realschule, Abitur, Vordiplom/Bachelor) hin analysiert. Aufgrund der studentischen (d. h. relativ leistungshomogenen) Stichprobe bestand die Erwartung, dass die Korrelationen etwas unterschätzt werden. Da die Validität in dieser Arbeit jedoch „vergleichend“ für bestimmte Tests bzw. experimentelle Bedingungen bestimmt wurde, sollte dies keine Rolle spielen. In Studie 1 (N = 106) wurde zunächst untersucht, wie schwierig die Items in einem Rechenkonzentrationstest sein sollten, um gute Vorhersagen zu gewährleisten. Dazu wurden leichte und schwierigere Items vergleichend auf ihre Korrelation zum Kriterium hin untersucht. Im Ergebnis waren sowohl leichte als auch schwierigere Testvarianten ungefähr gleich prädiktiv. In Studie 2 (N = 103) wurde die Rolle der Testlänge untersucht, wobei die prädiktive Validität von Kurzversion und Langversion in einem Rechenkonzentrationstest vergleichend untersucht wurde. Im Ergebnis zeigte sich, dass die Kurzversion valider war als die Langversion und dass die Validität in der Langversion im Verlauf abnimmt. In Studie 3 (N = 388) stand der Aspekt der Testdiversifikation im Vordergrund, wobei untersucht wurde, ob Intelligenz besser mit einem einzelnen Matrizentest (Wiener Matrizen-Test, WMT) oder mit einer Testbatterie (Intelligenz-Struktur-Test, I-S-T 2000 R) erfasst werden sollte, um gute prädiktive Validität zu gewährleisten. Die Ergebnisse sprechen klar für den Matrizentest, welcher ungefähr gleich valide war wie die Testbatterie, aber dafür testökonomischer ist. In den Studien 4 (N = 105) und 5 (N =97) wurde die prädiktive Validität zur Vorhersage von Schulleistungen im Kontext einer realen Personalauswahlsituation untersucht. Während die großen Testbatterien, Wilde-Intelligenz-Test 2 (WIT-2) und Intelligenz-Struktur-Test 2000R (I-S-T 2000 R), nur mäßig gut vorhersagen konnten, war der Komplexe Konzentrationstest (KKT), insbesondere der KKT-Rechentest ein hervorragender Prädiktor für schulische und akademische Leistungen. Auf Basis dieser Befunde wurden schließlich Empfehlungen und Anwendungshilfen für den strategischen Einsatz von Testinstrumenten in der diagnostischen Berufspraxis ausgesprochen.
267

[en] INFORMATION ASYMMETRY IN PRIVATE HEALTH INSURANCE CONTRACTS AND THE RELATIONSHIP BETWEEN MORBIDITY AND WORK MARKET: AN INVESTIGATION USING PNAD 2003 / [pt] ASSIMETRIA DE INFORMAÇÃO NA CONTRATAÇÃO DE PLANOS PRIVADOS DE SAÚDE E A RELAÇÃO ENTRE MORBIDADE DE MERCADO DE TRABALHO: UMA INVESTIGAÇÃO A PARTIR DA PNAD 2003

BERNARDO JOSÉ DE BRITO FERREIRA 22 July 2009 (has links)
[pt] Conhecer o perfil da população brasileira que possui planos privados de saúde é fundamental para orientar as políticas da Agência Nacional de Saúde (ANS) e a linha de ação das seguradoras e operadoras de saúde. A proposta deste projeto é de fazê-lo sob a ótica do mercado de trabalho, levando em consideração a morbidade auto-referida dos indivíduos, e controlando também pelas variáveis demográficas e sócio-econômicas. Para tanto, primeiramente, realizou-se um estudo exploratório relacionando a posse de planos de saúde com estas variáveis. Depois disso, ajustamos modelos logísticos de regressão para explicar as morbidades auto referidas a partir da situação do indivíduo no mercado de trabalho, controlando pelas variáveis demográficas. A mesma classe de modelos foi utilizada como ferramenta para investigar o fenômeno conhecido como Assimetria de Informação na contratação de planos privados de saúde. Os resultados concentram os casos de assimetria de informação em algumas doenças. Pudemos identificar também grupos de trabalhadores com alta propensão a determinadas doenças em determinadas grandes regiões do país. / [en] Knowing about the profile of the Brazilian population covered by private health plans is very important to guide the National Health Agency policies, the health insurance companies` action strategies in many ways and how the many agents involved should stand toward this process. Our purpose is to do this in the light of the work market situation, taking into account his/her self-reported morbidity, controlling for the demographical and social-economical variables. We start by presenting an exploratory study linking health plan owning with these variables. We then make use of logistic regression models, which have been adjusted to explain de self-reported morbidity according to the individual`s position in the job market, controlling for the demographical variables. The same class of model has also been used as a tool to investigate the existence of Information Asymmetry in this type of contract. Our results show that information asymmetry cases are concentrated in some diseases. We could also find some worker groups very likely to being ill from specific diseases in some specific regions of the country.
268

Aplicação da teoria de resposta ao item na análise de diferença de gênero de sintomas depressivos na população universitária brasileira / Using item response theory to analyze gender difference in depressive symptoms in Brazilian college students

Sá Júnior, Antônio Reis de 15 April 2019 (has links)
INTRODUÇÃO: Embora seja amplamente difundida a ideia de que não existem diferenças significativas entre homens e mulheres em termos dos sintomas que experimentam durante os episódios depressivos, pesquisas recentes sugerem que as diferenças sutis no perfil de sintomas podem existir e apontar para fundamentais diferenças de gênero na fisiopatologia dos estados depressivos. As mulheres são duas vezes mais propensas que os homens a sofrer de transtorno depressivo maior durante a sua vida e uma série de estudos têm indicado que as mulheres deprimidas tendem a apresentar mais sintomas depressivos \"atípicos\" e mais ansiedade e sintomas de somatização quando comparadas com os homens. Alguns dos problemas com as abordagens tradicionais para medir estas diferenças se devem às deficiências da teoria estatística subjacente dessas abordagens, como é o caso da teoria clássica dos testes (TCT). Nos últimos anos, o uso da teoria de resposta ao item (TRI) começou a substituir TCT na construção e avaliação de testes e alguns autores têm seguido com o uso de métodos baseados na TRI para estudar significativas variações clínicas individuais em vez dos métodos tradicionais baseados em escores totais. Um modelo para a TRI muitas vezes utilizado com a pontuação de itens politômicos é o modelo de resposta gradual (MRG). O MRG é adequado para a análise de escalas psicométricas com as categorias de resposta ordenadas e crescentes. O funcionamento diferencial dos itens da escala do Inventário de Depressão de Beck segunda versão (BDI-II) foi investigado, por gênero, comparando estes subgrupos de alunos MÉTODOS: Os 21 itens do BDI-II foram aplicados transversalmente em uma amostra representativa de 12.677 estudantes universitários brasileiros. A confiabilidade foi avaliada com base no coeficiente alfa de Cronbach. Os parâmetros gravidade (bi) e discriminação (a) de cada item do BDI-II foram calculados através do MRG. A influência do gênero foi testada para o funcionamento diferencial do item (DIF) dentro da abordagem baseada na TRI. RESULTADOS: O BDI-II apresentou boa confiabilidade (alfa = 0,91). As mulheres apresentaram significativamente maior probabilidade de depressão (ponto de corte > 13) do que os homens. Em geral, os participantes endossaram mais facilmente itens cognitivo-somáticos do que itens afetivos da escala. \"Sentimento de culpa\", \"ideias suicidas\" e \"perda de libido\" foram os itens que mais provavelmente indicaram maior gravidade da depressão (b3 >= 3,60). No entanto, todos os itens da BDI-II mostraram de moderada a alta discriminação (a >= 1,32) para o estado depressivo. Enquanto um item foi sinalizado como apresentando DIF, \"chorar\", com maior probabilidade das mulheres endossarem esse sintoma, o peso global desse item no escore total foi insignificante. CONCLUSÕES: Embora o gênero dos entrevistados possa apresentar influência no padrão de resposta dos sintomas depressivos, as medidas dos sintomas autorreferidos não inflacionaram os escores de gravidade. Esses achados fornecem suporte adicional à validade do uso do BDI-II para avaliar a depressão em contextos acadêmicos e destacam o valor de se considerar sintomas comuns de depressão relacionados ao sexo / INTRODUCTION: Although it is widely held that there are no significant differences between men and women in terms of the symptoms that they experience during depressive episodes, recent research suggests that subtle differences in symptom profile may exist and may point to fundamental gender differences in the pathophysiology of depressive states. Women are up to twice as likely as men to suffer from major depressive disorder during their lifetime and a number of studies have found that depressed females tend to exhibit more \'atypical\' depressive symptoms and more anxiety and somatization symptoms compared to men. Some of the problems with the traditional approaches to measure change are due to the shortcomings of the underlying statistical theory of these approaches, which is classical test theory (CTT). In recent years, the use of item response theory (IRT) has started to replace CTT in test construction and test evaluation and some authors have called for the use of IRT based methods to study individual clinical significant change instead of traditional methods based on total scores. An often-used IRT model for polytomous item scores is the graded response model (GRM). The GRM is suitable for analyzing psychometric scales with ordered and growing response categories. Potential differential item functioning of the scale items of the Beck Depression Inventory-II (BDI-II) is investigated, by gender, to compare across sub-groups of students. METHODS: The 21-item BDI-II was cross-sectionally administered to a representative sample of 12,677 Brazilian college students. Reliability was evaluated based on Cronbach\'s alpha coefficient. Severity (bi) and discrimination (a) parameters of each BDI-II items were calculated through the GRM. The influence of gender and age were tested for differential item functioning (DIF) within the IRT-based approach. RESULTS: The BDI-II presented good reliability (alpha = 0.91). Women significantly presented a higher likelihood of depression (cut-off > 13) than men. In general, participants endorsed more easily cognitive-somatic items than affective items of the scale. \"Guilty feelings\", \"suicidal thoughts\", and \"loss of interest in sex\" were the items that most likely indicated depression severity (b3 >= 3.60). However, all BDI-II items showed moderate-to-high discrimination (a >= 1.32) for depressive state. While one item was flagged for DIF, \"crying\", for gender, with women more likely endorsed this symptom than men, the global weight of this item on the total score was negligible. CONCLUSIONS: Although respondents\' gender might present influence on response pattern of depressive symptoms, the measures of self-reported symptoms have not inflated severity scores. These findings provide further support to the validity of using BDI-II for assessing depression in academic contexts and highlight the value of considering gender-related common symptoms of depression
269

Teste adaptativo computadorizado nas avaliações educacionais e psicológicas / Computerized adaptative test in educational and psychological evaluation

Ricarte, Thales Akira Matsumoto 04 April 2013 (has links)
Testes Adaptativos Computadorizados (TAC) são aqueles que selecionam gradativamente as questões (itens) a serem apresentadas ao indivíduo de acordo com o seu nível de conhecimento (traco latente). Um TAC pode se basear em um modelo da Teoria da Resposta ao Item (TRI) para a estimação do traco latente e escolha do item a ser apresentado em cada passo do teste. Este trabalho apresenta modelos da TRI utilizados em TAC encontrados na literatura e descreve alguns métodos de calibração de itens para a formação e manutenção do banco de questões do teste sob o modelo de Samejima (1969), estimação do traço latente, seleção de itens com restrições utilizando a abordagem Shadow test e critérios de parada normalmente utilizados. Foram realizadas simulações com um banco grande (500 itens) e com um banco pequeno (21 itens) e avaliada a qualidade das estimativas dos traços latente (através do cálculos dos vícios e erros quadráticos médios) de TACs com diferentes números de itens. Foi aplicado o modelo de Samejima às respostas de estudantes do Exame ao proficiência em inglês (EPI) do ICMC - USP, que é aplicado semestralmente no formato lápis e papel, para a formação de um banco de itens e posterior construção de um TAC. Também foi aplicado o modelo às respostas de pacientes clínicos do Hospital das Clínicas da Medicina da USP, cedido pelo doutor Yuang-Pang Wang, ao Inventário de Depressão de Beck (BDI) para os mesmos propósitos. Comparações com a atual metodologia para avaliação da proficiência em língua inglesa do EPI (Medida de Probabilidade Admissível, MPA) e para o diagnóstico de depressão do BDI (critério sugerido por Kendall et al., 1987) foram realizadas demonstrando as vantagens e maior riqueza dos resultados obtidos com a TRI e com os TACs implementados. Adcionalmente foi desenvolvido um programa Same-CAT que armazena bancos de itens e possibilita a criação e aplicação de TACs com restrições, através da abordagem Shadow test / Computerized Adaptive Tests (CAT) are those that select questions (items) gradually to be presented to an individual according to their proficiency (latent trait level). A CAT can be based on an Item Response Theory (IRT) model for estimation of the latent trait and selection of the next item to be presented in each step of the test. This paper presents IRT models used in CATs found in literature and describes some methods of item calibration for creation and maintenance of a test items bank under the Samejima\'s model (Samejima; 1969), estimation of latent trait, item selection with constraints using the Shadow test approach and usuals stopping criteria. Simulations were conducted with a large bank (500 items) and a small bank (21 items) and the quality of the estimatives of latent traits were evaluated (through calculations of mean squared errors and bias) TACs with different item numbers. Samejima\'s model were applied for the responses of students to the English Proficiency Exam (EPE) of ICMC - USP, a test applied twice a year in paper and pencil format, to create an item bank and subsequent construction of a CAT. The model was also applied to the responses of clinical patients from the Hospital das Clnicas - USP, given by Dr. Yuang-Pang Wang, to the Beck Depression Inventory (BDI) for the same purposes. Comparisons using the current methodology to evaluate the English Language Proficiency of EPE (Measure of Probability Allowable, MPA) and the BDI (criterion suggested by Kendall et al., 1987) were performed, and the CATs provided better and richer results. Furthermore a program, Same-CAT, that stores item banks and allows CAT\'s applications with constraints was created
270

Neue Methoden zur Entdeckung von Fehlspezifikation bei Latent-Trait-Modellen der Veränderungsmessung

Klein, Stefan 09 May 2003 (has links)
Ziel der Arbeit ist die Entwicklung von Modellen zur Entdeckung von Fehlspezifikation im Linear Logistic Test Model ( = LLTM) und verwandten Modellen der Verände\-rungs\-mes\-sung. Fehlspezifikation bedeutet hierbei, dass dem Modell ein unzutreffendes Muster latenter Traits zugrundegelegt wurde. Dies kann, vgl. z.B. [Baker,1993], zu bedeutenden Schätzfehlern führen. Die hier vorgestellten Methoden ermöglichen es unter leicht zu erfüllenden Annahmen, Aussagen über das Ausmaß der Unkorrektheit der verwendeten Modellspezifikation zu machen, ohne die in der Modellschätzung bestimmten Parameterwerte verwenden zu müssen. Zunächst wird eine auf dem Mantel-Haenszel-Test beruhende Methodik vorgestellt, die bei Tests bezüglich der Veränderungsparameter eines LLTMs als direkte Konkurrenz zu den bekannten Likelihood-Ratio-Tests für das LLTM anzusehen ist, wie sie z.B. bei [Fischer,1995a] vorgestellt werden. Weiterhin werden für das LLTM optimierte Personenfittests und daraus abgeleitete Effektgrößen vorgestellt. Diese ermöglichen das Auffinden von Subpopulationen, bei denen eine Abweichung zum angenommenen Modell aufgetreten ist. Es werden die statistischen Eigenschaften dieser Tests resp. Effektgrößen mittels Simulation und Teststärkeberechnung untersucht und Anwendungsbeispiele für diese Methoden vorgestellt. / In this thesis, new methods are developed for the detection of misspecification within Linear Logistic Test Models (=LLTM) and similar model classes for the measurement of change. The phrase "misspecification" will be used if a wrong selection of latent traits is chosen for the estimation of the LLTM. Misspecification can lead to erronious estimation [Baker,1993]). Using the newly developed methods, it is possible to measure the extent of deviations between the proposed model and the data. This can be done without using estimated parameter values. First a method is introduced which is based on the well-known Mantel-Haenszel-test. For some hypotheses, this method can be used instead of a Likelihood Ratio Test (e.g. [Fischer,1995a]). The Main topic of this thesis are uniformly most powerful tests for the measurement of person fit and related effect measures. These effect measures can be used for the identification of subpopulations where the proposed model does not hold. Statistical properties of these tests resp. effect measures are examined by simulations and power calculations using the SAS software. Furthermore, examples of the application of these methods are given.

Page generated in 0.0344 seconds