Why this article is worth discussing: For those interested in using course evaluation results to improve teaching, this article offers a set of evidence-based recommendations—clearly described and supported with multiple references. The review focuses exclusively ...
A year ago I received the worst student ratings of instruction (SRIs) in my 28 years of teaching. On the Likert scale I am normally between 4 and 5 for quality of instructor and quality ...
Editor’s note: The following article is part of a resource collection called It’s Worth Discussing, in which we feature research articles that are especially suitable for personal reflection and group discussion with your colleagues.
Why this article is worth discussing: For those interested in using course evaluation results to improve teaching, this article offers a set of evidence-based recommendations—clearly described and supported with multiple references. The review focuses exclusively on using end-of-course evaluation results for improvement purposes. It covers features of evaluations that generate good data, interpretation of the results, and development of action plans. It recognizes but does not consider evaluations’ use in the promotion and tenure process. By contrast, most reviews are more broadly based and not as pragmatic. The article is also worth discussing because research results indicate that end-of-course ratings tend to remain stable, meaning their regular use automatically improves teaching. End-of-course ratings can increase instructional effectiveness; this article proposes a logical, sensible way of achieving that goal.
Boysen, G. A. (2016). Using student evaluation to improve teaching: Evidence-based recommendations. Scholarship of Teaching and Learning in Psychology, 2(4), 273–284. https://doi.org/10.1037/stl0000069
The four steps in this improvement process start with the instrument itself. First, it needs to be reliable and valid. In other words, it must measure what it’s supposed to measure. Second, the response rate needs to be adequate to ensure the data’s integrity. If there are 100 students in the course and only 10 complete the evaluation, that’s not a representative sample. That author discusses a variety of ways faculty can improve response rates. Third, good improvement decisions depend on a systematic analysis of the results: “In order for teachers to improve based on student evaluations, they must avoid haphazard interpretations based on simple heuristics” (p. 278). This need for careful review applies to quantitative as well as qualitative feedback. Finally, teachers need to set goals for improvement, and the evidence-based recommendation is to do that in consultation with a peer or an instructional expert.
During a discussion of or reflection on feedback from students, it’s important to note that the research on student evaluations is voluminous, with studies reporting a wide range of results. The literature can be cherry-picked to support any number of foregone conclusions. This review primarily relies on meta-analyses—those big reviews of research—that identify trends. It cites lots of individual studies as examples but does not make recommendations based on isolated explorations. It also cites examples that refute the trends.
Validity involves how the instrument defines good teaching and whether the dimensions of teaching that the individual items identify can be connected to learning. Reliability includes empirical issues related to interpreting the items on the instrument.
“Many colleges, rather than using standardized measures with known reliability and validity, create their own student evaluation measures by haphazardly selecting survey questions with face validity” [ones that “look like” they’ll measure, in this case, teaching effectiveness] (p. 275).
“Teachers seeking more trustworthy feedback can select a standardized survey to administer for professional development purposes” (p. 275). Note: three instruments are referenced and the actual instruments appear in the material referenced.
“Just as students need specific feedback on their performance in order to learn, teachers need specific, multidimensional feedback on their pedagogical skills if they seek to improve. Single items [“Overall, rate the quality of this instructor”] cannot provide such feedback” (p. 276).
“The perspective of students matters” (p. 274). This is how the author responds to arguments that students aren’t qualified to evaluate teaching or that their satisfaction with the course and instructor doesn’t matter. He also establishes the validity of ratings by listing six indicators of teaching quality that student evaluations predict. These include teachers’ self-evaluations, the ratings of trained observers, alumni ratings, student predictions of their own learning, objective measures of student achievement, and ratings of the same instructor in other courses (see p. 274).
Most institutions have moved to online evaluations, which have lowered response rates and raised concerns about who’s completing the evaluations and the fairness of their assessments.
“From a psychometric perspective, low response rates increase measurement error, which impedes the ability to make decisions from the data” (p. 277). Sampling theory proposes that a 3 percent margin of errors requires a 97 percent response in a class of 20, a 93 percent response rate in a class of 50, and an 87 percent response rate in a class of 100. A 10 percent margin of error for the same class sizes requires response rates of 58 percent, 35 percent, and 21 percent, respectively. There is not yet agreement as to an appropriate percentage of error for course evaluations. (See the discussion on p. 277.)
As for the reduced response rates with online evaluations,
“response rates have varied between studies, but it is safe to assume that at least 20% fewer students will complete online versus a face-to-face student evaluation survey” (p. 276).
“Online evaluations do not appear to be dominated by students who earned low grades and who, on average, tend to give lower evaluations of their teachers” (p. 276).
“There is no reason to settle for a low response rate because teachers have a wide variety of techniques at their disposal to increase participation” (p. 277). The author suggests that the prevalence of electronic devices makes it possible to complete online evaluations during class. He also recommends explanations as to why the feedback matters, repeated reminders, and incentives.
Interpreting course evaluation feedback isn’t always easy. Sometimes the results conflict. Sometimes the ratings change just a little bit. Occasionally, it doesn’t look like anything needs to improve. And every now and then a student offers a blistering critique of the course and instructor. There’s a need to look at rating data systematically and objectively.
“Student evaluation results represent scientific data, but the research suggests that faculty readily interpret that data without reference to established statistical principles” (p. 278).
“Because of the error that is inherent in any psychological measurement, student evaluations are not precise representations of teaching effectiveness” (p. 278).
“Just as researchers would never make conclusions about the results of a study based on raw means, teachers should not try to make pedagogical improvements based on unsystematic comparisons of raw student evaluation means” (p. 279).
“[Student] comments are presented as an unorganized mass in student evaluation reports, and this leads teachers to review and utilize them in a similarly unorganized way” (p. 279).
The information derived from course evaluations accomplishes nothing unless it’s acted on.
“Several longitudinal investigations have followed trends in student evaluations among the same group of teachers across multiple years, and the results indicate that evaluations remain stable despite the multiple rounds of feedback received by teachers” (p. 279).
“Meta-analysis indicates that teachers should discuss their evaluation results with a peer or instructional expert” (p. 280).
“Teachers should set goals for improvement” (p. 280).
“In general, improvement of teaching includes steps that are typical of all types of behavior modification—evaluating the current behavior, determining what needs to be altered, and acting on a specific plan for change” (p. 280).
Golding, C., & Adam, L. (2016). Evaluate to improve: Useful approaches to student evaluation. Assessment & Evaluation in Higher Education, 41(1), 1–14. https://doi.org/10.1080/02602938.2014.976810
Hodges, L. C., & Stanton, K. (2007). Translating comments on student evaluations into the language of learning. Innovative Higher Education, 31, 279–286. https://doi.org/10.1007/s10755-006-9027-3
To sign up for weekly email updates from The Teaching Professor, visit this link.