Evaluating the impact of errors made by English language learners on a high-stakes, holistically scored writing assessment
Due to the ever-increasing number of English language learner (ELL) students in public schools and the increased public demand for school accountability, it is more important than ever before to uncover potential bias among high-stakes assessments. Texas is one state with an annual high-stakes assessment, formerly known as the Texas Assessment of Academic Skills (TAAS), which includes a direct vi assessment of writing. The writing portion is scored holistically, and ELL students must meet the same standard as their proficient English-speaking, or non-ELL, counterparts. Prior research has demonstrated that holistic writing assessment raters are open to bias in appearance and irritation due to an overwhelming number of certain kinds of errors. Furthermore, previous research has shown that ELL students are apt to make particular surface errors that may both irritate raters and stigmatize themselves. The equity issues underlying these findings led to the following research questions: 1) What is the nature of naturally occurring surface errors made by 8th grade ELL writers compared to those made by their proficient English-speaking peers on a high-stakes writing exam? 2) What is the nature of naturally occurring surface errors made by 8th grade writers who received a high score compared to those made by their peers who received a low score? 3) Is there an interaction between superficial errors and ELL status in the scoring of 8th grade TAAS writing exams? In order to discover if, in fact, raters of the state’s writing exam are unduly influenced by the presence of surface errors in the writing of 8th grade ELL students, a random stratified sample of 50 ELL essays and 50 non-ELL essays was drawn from the 2002 administration. The essays were then parsed into t-units and errors were coded into 15 categories that were inductively determined from the sample and a review of the relevant literature. A 2 (ELL and non-ELL) X 2 (High Score and Low Score) MANOVA was performed. Main effects were found for ELL status and for scoring status. Interactions were found for the following dependent variables: number of paragraphs, total number of errors, number of error-free t-units, and number of lexis errors per t-unit.