An evaluation of item difficulty and person ability estimation using the multilevel measurement model with short tests and small sample sizes

dc.contributor.advisorBeretvas, Susan Natashaen
dc.contributor.committeeMemberDodd, Barbara G.en
dc.contributor.committeeMemberPituch, Keenan A.en
dc.contributor.committeeMemberPowers, Daniel A.en
dc.contributor.committeeMemberZimmaro, Dawn M.en
dc.creatorBrune, Kelly Dianeen
dc.date.accessioned2011-06-08T14:32:51Zen
dc.date.available2011-06-08T14:32:51Zen
dc.date.available2011-06-08T14:33:00Zen
dc.date.issued2011-05en
dc.date.submittedMay 2011en
dc.date.updated2011-06-08T14:33:00Zen
dc.descriptiontexten
dc.description.abstractRecently, researchers have reformulated Item Response Theory (IRT) models into multilevel models to evaluate clustered data appropriately. Using a multilevel model to obtain item difficulty and person ability parameter estimates that correspond directly with IRT models’ parameters is often referred to as multilevel measurement modeling. Unlike conventional IRT models, multilevel measurement models (MMM) can handle, the addition of predictor variables, appropriate modeling of clustered data, and can be estimated using non-specialized computer software, including SAS. For example, a three-level model can model the repeated measures (level one) of individuals (level two) who are clustered within schools (level three). Limitations in terms of the minimum sample size and number of test items that permit reasonable one-parameter logistic (1-PL) IRT model’s parameters have not been examined for either the two- or three-level MMM. Researchers (Wright and Stone, 1979; Lord, 1983; Hambleton and Cook, 1983) have found that sample sizes under 200 and fewer than 20 items per test result in poor model fit and poor parameter recovery for dichotomous 1-PL IRT models with data that meet model assumptions. This simulation study tested the performance of the two-level and three-level MMM under various conditions that included three sample sizes (100, 200, and 400), three test lengths (5, 10, and 20), three level-3 cluster sizes (10, 20, and 50), and two generated intraclass correlations (.05 and .15). The study demonstrated that use of the two- and three-level MMMs lead to somewhat divergent results for item difficulty and person-level ability estimates. The mean relative item difficulty bias was lower for the three-level model than the two-level model. The opposite was true for the person-level ability estimates, with a smaller mean relative parameter bias for the two-level model than the three-level model. There was no difference between the two- and three-level MMMs in the school-level ability estimates. Modeling clustered data appropriately; having a minimum total sample size of 100 to accurately estimate level-2 residuals and a minimum total sample size of 400 to accurately estimate level-3 residuals; and having at least 20 items will help ensure valid statistical test results.en
dc.description.departmentEducational Psychologyen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2011-05-2999en
dc.language.isoengen
dc.subjectMultilevel measurement modelen
dc.subjectMMMen
dc.subjectItem response theoryen
dc.subjectIRTen
dc.subjectHierarchical generalized linear modelingen
dc.subjectTestingen
dc.titleAn evaluation of item difficulty and person ability estimation using the multilevel measurement model with short tests and small sample sizesen
dc.type.genrethesisen
thesis.degree.departmentEducational Psychologyen
thesis.degree.disciplineEducational Psychologyen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BRUNE-DISSERTATION.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Plain Text
Description: