Comparing item selection methods in computerized adaptive testing using the rating scale model

dc.contributor.advisorDodd, Barbara Glenzing
dc.contributor.committeeMemberWhittaker, Tiffany A
dc.contributor.committeeMemberCasabianca-Marshall, Jodi M
dc.contributor.committeeMemberHersh, Matthew A
dc.creatorButterfield, Meredith Sibley
dc.date.accessioned2016-11-08T20:51:04Z
dc.date.available2016-11-08T20:51:04Z
dc.date.issued2016-08
dc.date.submittedAugust 2016
dc.date.updated2016-11-08T20:51:04Z
dc.description.abstractComputer Adaptive Testing (CAT), a form of computer-based testing that selects and administers items that match the examinee’s trait levels, can be shorter in length and maintain comparable or greater measurement precision than traditional fixed-length paper-and-pencil testing. Administration of computer-based patient reported outcome (PRO) measures has increased recently in the medical field. Because PRO measures often have small item pools, small numbers of items administered, and populations in poor health, the benefits of CATs are especially advantageous. In CAT, Maximum Fisher information (MFI) is the most commonly used item selection procedure since it is easy to use and computationally simple. However, its main drawback is the attenuation paradox. If the estimated trait level of the examinee is not the true trait level, the items selected will not maximize information at the true trait level and the measurement is less precise. To address this issue, alternative item selections methods have been proposed. In studies, these alternatives have not performed better than MFI. Recently, Gradual Maximum Information Ratio (GMIR) item selection method was proposed and previous findings suggest GMIR could be beneficial for a short CAT. This simulation study compared GMIR and MFI item selection methods under conditions specific to the constraints of the PRO measures. GMIR and MFI are compared under Andrich’s Rating Scale Model (ARSM) across two polytomous item pool sizes (41 and 82), two population latent trait distributions (normal and negatively skewed), and three combination maximum number of item and minimum standard error stopping rules (5/0.54, 7/0.46, 9/0.40). The conditions were fully crossed. Performance was evaluated in terms of descriptive statistics of the final trait estimates, measurement precision, conditional measurement precision, and administration efficiency. Results found GMIR had better measurement precision when the test length was 5 items, with higher mean correlations between known and estimated trait levels, smaller mean bias, and smaller mean RMSE. An effect of item pool size and population latent trait distribution was not found. Across item selection methods, measurement precision increased as the test length increase, but with diminishing returns from 7 to 9 items.
dc.description.departmentEducational Psychology
dc.format.mimetypeapplication/pdf
dc.identifierdoi:10.15781/T25D8NH87
dc.identifier.urihttp://hdl.handle.net/2152/43680
dc.language.isoen
dc.subjectComputer adaptive testing
dc.subjectItem selection
dc.subjectSimulation
dc.subjectPatient reported outcome
dc.subjectGradual maximum information ratio
dc.subjectAndrich's rating scale model
dc.titleComparing item selection methods in computerized adaptive testing using the rating scale model
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentEducational Psychology
thesis.degree.disciplineEducational psychology
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BUTTERFIELD-DISSERTATION-2016.pdf
Size:
10.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description: