Evaluating courses without making tutors feel censured

Posted by Stuart on October 30, 2015 · 2 mins read

Most tutor evaluation forms are fairly awful, to be honest.

One big factor is the question format. A surprisingly high proportion of evaluations use Likert scales, where respondents are given a statement and asked to rate the extent to which they agree with it.

A good example is SEEQ (Marsh, 1982), which consists of 32 questions or so, using Likert scales throughout. Here’s a few questions:

“I have learned something I consider valuable”

  • Strongly disagree
  • Disagree
  • Neutral
  • Agree
  • Strongly agree

“Course materials were well prepared and carefully explained”

  • Strongly disagree
  • Disagree
  • Neutral
  • Agree
  • Strongly agree

Setting aside the double question (the and) in the second of these, there is a kind of bias in asking people to agree with a statement. Let’s compare these to an behavioural observational version:

“To what extent do you consider you have learning something valuable?”

  1. Not at all
  2.  
  3.  
  4. Very

“How well prepared and explained were the course materials?”

  1. Not at all
  2.  
  3.  
  4. Very

Which do you think works better?

I don’t like Likert scales, for the reasons outlined by Eley & Stecher (1997) (sadly paywalled, but the abstract might be enough). Put simply, Likert scales elicit impressions, but not objective assessments. They’re less valid, and have a significant tendency to drift to extremes. There’s even a kind of group pressure in them. Variance is higher than it should be. It might be “reliable” and “valid”, but it is possible for evaluations to be both more reliable and more valid, and it actually doesn’t take much to do that.

Anyway, my story is this. A good while ago, I was responsible for running a course and we had a fairly terrible questionnaire we were supposed to use. I had reservations about this for a long time, and because the questionnaire asked students, more or less verbatim, what they thought of the tutor, the results tended to be very unreliable.

One anecdotal version: I got feedback praising me for fixing the printer, while a fellow tutor was criticised because they made students think about the course topics. All in all, it permitted the students to be extremely subjective in their responses.

So I decided to do a small experiment. I threw together a small evaluation form in a day or so, using theory to guide it – including replacing the Likert scales with behavioural observational scales. There was more to it than this, we also focused very explicitly on the experience of the course (rather than what you thought of the tutor), because, really, as an institution that was what we cared about it.

You can find the evaluation form here:

I’ve never formally tested it for validity and reliability, but we used it for many years afterwards as the primary evaluation instrument, and all the concerns we’d had from tutors about unhelpful evaluations stopped instantly.

Feel free to use it and adapt it.

References

  • Eley, M. G., & Stecher, E. J. (1997). A comparison of two response scale formats used in teaching evaluation questionnaires. Assessment & Evaluation in Higher Education, 22(1), 65–79.
  • Marsh, H. W. (1982). SEEQ: a reliable, valid, and useful instrument for collecting students' evaluations of university teaching. British Journal of Educational Psychology, 52, 77-95.