Interest in and use of peer assessment has grown in recent years. Teachers are using it for a variety of reasons. It's an activity that can be designed so that it engages students, and if it's well designed, it can also be an approach that encourages students to look at their own work more critically. On the research front, some studies of peer assessment have shown that it promotes critical thinking skills and increases motivation to learn. In addition, peer assessments are a part of many professional positions, which means they're a skill that should be developed in college.
But for teachers, there are several lingering questions. What kind of criteria are students using when they assess each other's work? Are those criteria like the ones their teachers are using? Given the importance of grades, can students be objective, or do they only provide positive feedback and high marks? To what extent do peer assessments agree with those offered by the teacher?
Falchikov and Goldfinch's (2000) meta-analysis of 48 studies of peer assessment published between 1959 and 1999 reported a moderately strong correction of .69 between teacher and peer assessments done by students. A large educational psychology team decided it was time to update that research, especially given a significant number of digital peer assessments are now being completed. They also wanted to learn more about the impact of certain factors on peer assessments.
This team analyzed 69 studies published since 1999. Unlike Falchikov and Goldfinch, they included studies done in K–12 grade levels, although there were a small number of them. They found the estimated average Pearson correlation between peer and teacher ratings was also moderately strong at .63.
Most interesting in this recent research are findings about factors related to peer assessment. Here are some highlights:
- When the peer assessment is computer-assisted, the correlations drop to .50, but the researchers note a couple of issues. There is wide variation in the kind of computer involvement in peer assessment, and some studies provided no detail as to how computers were used. So, more research is needed. But the correlation is significantly higher if the peer assessments are paper based.
- As might be expected, the correlations were higher in graduate courses than in undergraduate courses.
- Group assessment correlations were significantly lower than individual assessments. The researchers hypothesize this is because assessment in groups involves interactions among group members and the dynamics within the group.
- Voluntary peer ratings showed more agreement with teacher ratings than when the peer assessments were compulsory.
- Interestingly, the correlations were also higher when the identity of the peer rater was known. Related research has documented that when the ratings are anonymous, the raters tended to be harsher. Also when the rater identity is revealed, there may be a greater chance that the rater will take the task seriously, which means paying closer attention and thereby providing more accurate ratings.
- The correlations between teacher and student ratings were at .69 when students provided both a rating score and comments. Having to make comments forces reviewers to look carefully at the work and develop a rationale for their rating.
- When peer raters were involved in developing the assessment criteria, the correlations jumped to .86. The research team describes this finding as “striking”: “Discussion, negotiation, and joint construction of assessment criteria is likely to give students a great sense of ownership and investment in their evaluations” (p. 256). It also makes the criteria easier to understand and apply. A big surprise was that training the peer raters was not a variable that resulted in significantly higher correlations between peer and teacher assessments. The researchers think that the variable quality of the training across the studies may have made its effect difficult to capture.
What is noteworthy about this meta-analysis is the attempt to identify factors that affect the accuracy of student judgments about the work of their peers. The analysis assumes that teacher assessments are the gold standard. Students should be making assessments similar to those of the teacher. It is useful to know those factors that help to close the gap between teacher and student assessments. The research team notes, “We included only theoretically meaningful predictors that could be reliably coded. As a result, the current meta-analysis explained only about one-third of the variation of the agreement between peer and teacher ratings” (p. 258). This means there must be other factors influencing the correlation. For example, could the correlations be affected by whether the ratings were formative, designed to help the recipient improve, or whether they were summative, as in counted as part or all of the grade?
This is relevant work with findings that should be considered in the decision to use peer assessments. As with so much of the research on instructional practices, the issue is less whether a particular approach is viable and more about the best ways to use it.
Falchikov, N., and Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70
Hongli, L., Xiong, Y., Zang, X., Kornhaber, M., Lyu,Y., Chung, K., and Suen, H. (2016). Peer assessment in the digital age: A meta-analysis comparing peer and teacher ratings. Assessment & Evaluation in Higher Education, 41