Comparing item selection methods in computerized adaptive testing using the rating scale model

Butterfield, Meredith Sibley

Comparing item selection methods in computerized adaptive testing using the rating scale model

dc.contributor.advisor	Dodd, Barbara Glenzing
dc.contributor.committeeMember	Whittaker, Tiffany A
dc.contributor.committeeMember	Casabianca-Marshall, Jodi M
dc.contributor.committeeMember	Hersh, Matthew A
dc.creator	Butterfield, Meredith Sibley
dc.date.accessioned	2016-11-08T20:51:04Z
dc.date.available	2016-11-08T20:51:04Z
dc.date.issued	2016-08
dc.date.submitted	August 2016
dc.date.updated	2016-11-08T20:51:04Z
dc.description.abstract	Computer Adaptive Testing (CAT), a form of computer-based testing that selects and administers items that match the examinee’s trait levels, can be shorter in length and maintain comparable or greater measurement precision than traditional fixed-length paper-and-pencil testing. Administration of computer-based patient reported outcome (PRO) measures has increased recently in the medical field. Because PRO measures often have small item pools, small numbers of items administered, and populations in poor health, the benefits of CATs are especially advantageous. In CAT, Maximum Fisher information (MFI) is the most commonly used item selection procedure since it is easy to use and computationally simple. However, its main drawback is the attenuation paradox. If the estimated trait level of the examinee is not the true trait level, the items selected will not maximize information at the true trait level and the measurement is less precise. To address this issue, alternative item selections methods have been proposed. In studies, these alternatives have not performed better than MFI. Recently, Gradual Maximum Information Ratio (GMIR) item selection method was proposed and previous findings suggest GMIR could be beneficial for a short CAT. This simulation study compared GMIR and MFI item selection methods under conditions specific to the constraints of the PRO measures. GMIR and MFI are compared under Andrich’s Rating Scale Model (ARSM) across two polytomous item pool sizes (41 and 82), two population latent trait distributions (normal and negatively skewed), and three combination maximum number of item and minimum standard error stopping rules (5/0.54, 7/0.46, 9/0.40). The conditions were fully crossed. Performance was evaluated in terms of descriptive statistics of the final trait estimates, measurement precision, conditional measurement precision, and administration efficiency. Results found GMIR had better measurement precision when the test length was 5 items, with higher mean correlations between known and estimated trait levels, smaller mean bias, and smaller mean RMSE. An effect of item pool size and population latent trait distribution was not found. Across item selection methods, measurement precision increased as the test length increase, but with diminishing returns from 7 to 9 items.
dc.description.department	Educational Psychology
dc.format.mimetype	application/pdf
dc.identifier	doi:10.15781/T25D8NH87
dc.identifier.uri	http://hdl.handle.net/2152/43680
dc.language.iso	en
dc.subject	Computer adaptive testing
dc.subject	Item selection
dc.subject	Simulation
dc.subject	Patient reported outcome
dc.subject	Gradual maximum information ratio
dc.subject	Andrich's rating scale model
dc.title	Comparing item selection methods in computerized adaptive testing using the rating scale model
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Educational Psychology
thesis.degree.discipline	Educational psychology
thesis.degree.grantor	The University of Texas at Austin
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1

Name:: BUTTERFIELD-DISSERTATION-2016.pdf
Size:: 10.95 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 1.85 KB
Format:: Plain Text
Description:

Download

Collections

UT Electronic Theses and Dissertations