Extension of the item pocket method allowing for response review and revision to a computerized adaptive test using the generalized partial credit model
Computerized Adaptive Testing (CAT) has increased in the last few decades, due in part to the increased use and availability of personal computers, but also partly due to the benefits of CATs. CATs provide increased measurement precision of ability estimates while decreasing the demand on examinees with shorter tests. This is accomplished by tailoring the test to each examinee and selecting items that are not too difficult or too easy based on the examinees’ interim ability estimate and responses to previous items. These benefits come at the cost of the flexibility to move through the test as an examinee would with a Paper and Pencil (P & P) test. The algorithms used in CATs for item selection and ability estimation require restrictions to response review and revision; however, a large portion of examinees desire options for review and revision of responses (Vispoel, Clough, Bleiler, Hendrickson, and Ihrig, 2002). Previous research has examined response review and revision in CATs with limited review and revision options and are limited to after all items had been administered. The development of the Item Pocket (IP) method (Han, 2013) has allowed for response review and revision during the test, relaxing the restrictions, while maintaining an acceptable level of measurement precision. This is achieved by creating an item pocket in which items are placed, which are excluded from use in the interim ability estimation and the item selection procedures. The initial simulation study was conducted by Han (2013) who investigated the use of the IP method using a dichotomously-scored fixed length test. The findings indicated that the IP method does not substantially decrease measurement precision and bias in the ability estimates were within acceptable ranges for operational tests. This simulation study extended the IP method to a CAT using polytomously-scored items using the Generalized Partial Credit model with exposure control and content balancing. The IP method was implemented in tests with three IP sizes (2, 3, and 4), two termination criteria (fixed and variable), two test lengths (15 and 20), and two item completion conditions (forced to answer and ignored) for items remaining in the IP at the end of the test. Additionally, four traditional CAT conditions, without implementing the IP method, were included in the design. Results found that the longer, 20 item IP method conditions using the forced answer method had higher measurement precision, with higher mean correlations between known and estimated theta, lower mean bias and RMSE, and measurement precision increased as IP size increased. The two item completion conditions (forced to answer and ignored) resulted in similar measurement precision. The variable length IP conditions resulted in comparable measurement precision as the corresponding fixed length IP conditions. The implications of the findings and the limitations with suggestions for future research are also discussed.