A test of the reliability of student ratings over time
MetadataShow full item record
Reaching a true consensus on a definition or an evaluation of effective teaching has remained a challenge for researchers, administrators, faculty developers, and instructors in higher education, and as a result, the use of student ratings has also been debated. The purpose of this study was to compare student ratings of global items gathered during the semester with those gathered at the end of the semester, as well as the end of semester student ratings of students who had provided their mid-semester feedback twice with those who had not in order to measure the consistency with which students rated teaching effectiveness in their class under different conditions. Participants for this study included 394 undergraduate students enrolled in a total of seven sections of five courses. Within each of the seven classes, participants were randomly assigned to one of two groups: one that was primed by completing an online survey twice during the semester, and one that completed an alternate activity at the same time points. Both groups then completed the university’s Students’ Evaluation of Teaching survey with the rest of the class at the end of the semester. After the last day of classes, participants were also invited to attend a focus group session to discuss their experiences in this study. The analyses from the quantitative survey data indicated that for all of the classes, responses to individual items during the semester did not differ significantly from those at the end of the semester. For each of six classes, results did not identify any significant differences between primed and non-primed students on the final survey; however, one class revealed that non-primed students actually responded more consistently than primed students. Additionally, although six classes did not significantly differ for the primed group on the first two mid-semester surveys, one class showed that the ratings of these primed students became less consistent by the second mid-semester survey. Qualitative data from survey comments and focus group sessions were also examined for any patterns. The explanations of the findings as well as the implications of this study and directions for future research are discussed.