I worry that we’re missing some of what we can learn from end-of-course ratings. I know I was on this topic in another recent column, but student evaluations are ubiquitous—used by virtually every institution and completed by students in pretty much every course. And what have we to show for this endeavor? Better teaching? More learning? You’d be hard-pressed to find broad-based support for either of those outcomes.
The ratings do have career consequences at most places, although their seriousness does vary. Even with 50 years of research on ratings, the confused array of policies and practices associated with their use continues. My optimism about that getting sorted out has faded, but I’m still interested in the question of what an individual faculty member can learn about their teaching from rating results. It easy to look for what’s obvious: whether they have they gone up or down or there’s a comment that makes us feel good or proposes an interesting idea. But what might we find if we dug deeper into the data?
Teachers and administrators commonly assume that students’ overall ratings of a teacher’s effectiveness are based solely on features of the instruction. Some recent research (Curby et al., 2020), using an interesting study design, documented that the instructor only accounted for 22.90 percent of the variance in overall teaching effectiveness ratings. The course accounted for 8.61 percent, and the occasion—factors like the time of day, the classroom, and the characteristics of particular student cohorts—accounted for 6.43 percent of the variance; that leaves a large chunk of variance unexplained. “While the instructor—and presumably teaching—accounted for substantial variance in student course ratings, factors other than the instructor had a larger influence on student ratings” (p. 44).
What’s going through students’ minds when they are deciding how to rate a teacher’s effectiveness? Is it different for every student? Does it depend on the course? What other variables are at play? This research team looked at the interaction between instructor, course, and occasion: “This would be the variance attributable to particular instructors being rated higher for particular courses on particular occasions” (p. 46). Surprisingly, the largest amount of variance in this three-way interaction was measurement error, leaving the researchers unable to disentangle the interaction effects from error. Said more simply, a lot goes through students’ minds when they make judgments about teaching effectiveness.
Even though the variables influencing these judgements are complex and empirically unknowable, they are still amenable to some level of personal analysis. Early in my faculty development career I found (and regrettably have lost) an article by a faculty member analyzing his teaching on the basis of 25 years of rating data. He kept track of ratings for each course he taught, when the course met, the rooms he taught in, and what happened when he changed texts. He searched for trends across multiple sections.
That got me thinking that it might be interesting to collect several years’ worth of rating data from a faculty member to see whether those results could be used to identify the best teaching situation for that instructor. Some of what influences our ratings we know from experience. Just like our students, some of us aren’t at our brightest at 8:00 a.m. or 8:00 p.m. And some of us do better with certain kinds of students. I worked with a chemistry professor whose sections of general ed chemistry were packed with students from every field but science.
If we looked at end-of-course ratings for evidence that reveals or confirms what influences how well we teach, then it might be possible to advocate for those teaching circumstances. Should content expertise be the only qualification that matters when it comes to deciding who teaches what course? Of course, teachers need to know the content, but shouldn’t we also take advantage of the fact some teachers are better in some courses than others and for reasons not always related to content?
Insights about our teaching can be gleaned from many sources, including our course evaluations. Sometimes it’s easy overlook them; we get them regularly, and usually the results don’t vary all that much. We’ll see more in rating data if we look at results over a longer time window. And maybe we start with a different question: What can I learn from students about when I do my best teaching?
Curby, T., McKnight, P., Alexander, L., & Erchov, S. (2020). Sources of variance in end-of-course student evaluations. Assessment & Evaluation in Higher Education, 45(1), 44–53. https://doi.org/10.1080/02602938.2019.1607249
To sign up for weekly email updates from The Teaching Professor, visit this link.