Peer review could improve teaching assessment

As academics, our most profound influence on the world will be in training and shaping the next generation of students. Linda Nilson, director of Clemson University’s Office of Teaching Effectiveness and Innovation, likens the challenge of teaching well under current circumstances to being asked to perform “magic.”

In a time of shrinking resources, the demands on university teachers are greater than ever: larger classes, more diverse student populations (including large numbers that are under-prepared), higher research expectations, and more competition for employment. Teaching excellence is becoming a more difficult summit to climb. The most precarious aspect of that climb is a missing support – the lack of a reliable system for assessing and rewarding teaching.

As research expectations increase, there is less time and attention for teaching. There is often a general perception that those who emphasize teaching are unable to conduct research on the same level. But there is no evidence that excellent researchers cannot become excellent teachers.

Research is visible and brings in outside money to the university, reinforcing the notion that competence in teaching is all that is required. Yet, both the quality of undergraduate education and the capacity of graduate students to teach directly affect the reputation of the institution, the recruitment of researchers, and the intellectual vitality of the instructors, and thus the ability of faculty to succeed in research.

Most academic units give teaching equal weight in their formal systems of review, but in reality, particularly in tenure decisions, teaching often does not figure in the external reviews. While most departments require a combination of student evaluations, a teaching portfolio, a review of syllabi and peer reviews for research, peer review of teaching is rarely carried out. There are almost no rewards for attending teaching workshops.

In judging teaching excellence, most departments rely too much on quantitative student surveys. Most simply look at the average for overall student satisfaction, often ignoring other pertinent questions, such as amount of learning that took place, course workload, course difficulty and expected grade.

Studies show that grade distribution has a weak, if any, correlation with excellent teaching. Relying on student evaluations provides no chance to examine the actual feedback, or improvement in student work. Researchers have found a wide variety of variables directly affect evaluations, including the time of day that the class is held, class size and class level, but these factors are never taken into account in student evaluations. As T. Baldwin and N. Blattner state (“Guarding against potential bias in student evaluations,” 2003), “Most of us would not dream of giving our students a final exam as the only measure of their performance in the class.”

Quantitative student evaluations do not really get at the range of skills taught in a course. The skills might include “complex and transformative learning” that goes beyond absorbing facts towards developing thoughtful analysis and interpretation. Nor do course-based teacher evaluations get at a department’s overall vision and goals for teaching, if those exist at all.

What could accomplish these kinds of assessments? One possibility is peer review. Mary Van Note Chism sums up the issue this way in her 2007 book Peer Review of Teaching:

Specifically, experts indicate that while students are the most appropriate judges of day-to-day teacher behaviour and attitudes in the classroom, they are not the most appropriate judges of the accuracy of course content, use of acceptable teaching strategies in the discipline and the like. For these kinds of judgments, peers are the most appropriate source of information… (including giving feedback on): subject matter expertise, course goals, instructional materials and methods, assessment and grading practices, student achievement, professional and ethical behaviour, and thesis supervision.

But universities and faculty have not taken seriously the basic methodological principles or long-term outcomes of research design for evaluating teaching performance. A 2008 in-depth focus group study of teaching evaluations by J.J. Titus revealed that most student evaluations responded most directly to how the student “felt” about the course in terms of how much they enjoyed it; whether the instructor was “likeable” and “not boring;” whether the professor agreed with the student’s perspective on the subject matter; and whether a grade was given commensurate with their own perception of their effort. The author notes that no follow-up studies of the methodological effectiveness of teaching surveys are apparent, that survey scores are used as the sole measure of teaching effectiveness, and that they are treated as ordinal (sliding scale) rather than the categorical variables that they are (as averages are compared). Basic statistical rules of validity, reliability and tests of range and variance are largely ignored in such systems, though outcomes are considered unimpeachable in review systems. The end result is an academic culture where the student is client and consumer.

Graduate education perpetuates the cycle by emphasizing research, paying little or no attention to preparing graduate students for teaching or service. Comprehensive studies of graduate education reveal a dismaying lack of preparation for academic life, with little systemisation to graduate training in terms of expectations or guidance.

Survey on teaching rewards

To find out if the clues from our personal experience as well as the literature were accurate about assessment in Canada, we conducted a survey of political science departments across Canada. Our focus is on traditional departments teaching full-time students on campus.

Our study confirmed that while salary review procedures formally put teaching on an even keel with research, almost no one thinks teaching is given equal weight in reality. The responses to a series of questions relating to how teaching was evaluated were not surprising. We found an almost-exclusive reliance on student questionnaires. Adequate teaching is emphasized at the tenure-review stage, but not on an ongoing, periodic basis.

The message to new scholars is clear: “You can’t substitute teaching for scholarly activity and remain a credible university professor,” one interviewee stated bluntly. Some institutions that had valued teaching in the past are now emphasizing research excellence to the exclusion of teaching, said respondents. One noted that his institution is going through “a metamorphis.” Whereas teaching used to be “a really important criterion in terms of people’s career advancement,” now it has become “rapidly less important because all of the criteria for promotion have to relate to research.” As a result, academic staff try to avoid teaching because they need to perform well in other areas: “The problem with this is, of course, someone has to teach the students! There seems to be very little incentive to take teaching seriously at the moment.”

There has to be some way to give appropriate signals, via incentives, for assessing and improving performance in teaching. The current system does not, and it leads to continual internal political battles over the “burden” of teaching and service. As one interviewee stated, “we used to have ‘merit.’ Lousy teaching and poor service may have prevented the odd person from getting merit, but very few people ever got merit for teaching or service. It has now been abolished in favour of a series of awards.”

The evaluation of teaching also needs an overhaul. Evaluation should be based on course materials, delivery, and outcomes. Departments should state the core topics, skill sets, opportunities for real work experience, and a sequential set of courses, including possibly a capstone course, that allow students to build up and reflect upon the learning process.

There are two main approaches to achieving such goals on the departmental level. One is to set up a clear mission, evaluative criteria, assessment procedures and course planning from the top down – from the university, then department chair to the faculty. The second approach is more decentralised and where there is disagreement, allows the academic staff themselves to get together and decide upon such matters. In either case, objective assessment, peer review and incentive systems are vital to success.

Once teaching matters, the process should be seen by academic staff as a method of collectively improving their performance, recognizing the diverse talents of colleagues, and improving student outcomes. Assessment at the departmental or university level could include studies of students, alumni, academic staff and outside employers. The wide variety of methods we use to evaluate our research could be used for teaching: from interviews to focus groups to surveys. For core materials, comparing pre- and post- exams and placements could help to reveal collective success. Student evaluations are best done longitudinally (including after graduation), allowing time for the student to put the class material into perspective and use. These could include surveys of graduating students and alumni, asking about the performance of professors, techniques, and knowledge and skill bases.

It is extremely odd that we accept peer review for our research but do not embrace it as a tool for teaching. Our hesitation relates to the politics of salary review. Some argue that peer review can only be done by someone with sub-disciplinary knowledge (a point with which we disagree), and this supposedly opens up a potential lack of specialisation, conflict of interest and personal politics if a fellow member of the department carries one out. Even if peer review is voluntary and not counted, some department members are likely to have strong suspicions that the information could be used against them in salary review or promotion.

The obvious solution would be to have an outside person, perhaps from a neighbouring university, do the review. This raises the problem of resources and coordination; such a system could be used only occasionally, perhaps every few years. For young scholars who need intensive guidance, pairing them with a mentor would make the most sense. Rather than having teaching and learning resources relegated to poorly attended teaching workshops, they could become an integral part of improving instruction and spreading knowledge across universities.

At the class level, a pre- and post-course test would reveal improvement in student outcomes, rather than simply how students feel about the experience at the moment of angst (student evaluations are often conducted right before a final exam). A 2007 article (Stark-Wroblewski, Ahlering & Brill) discussed such an experiment that they conducted in a psychology course. The authors found quite a low correlation between improvement in test scores and student evaluation scores – just 0.18. Such tests could be a valuable instrument to help move the speculative nature of teaching results forward. Yet even such arrangements require supplemental peer review observation, to pick up the more subtle aspects such as cohort and individual differences in learning to analyze, interact with peers and make decisions in complex environments.

Dr. Hira is a professor of political science at Simon Fraser University and Ms. Cohen is a graduate student in the department. This article is an excerpt from a report on assessment systems in higher education. The full report including references can be downloaded at www.sfu.ca/~ahira. The authors would like to thank SFU’s Institute for the Study of Teaching and Learning in the Disciplines for their support, and Adrienne Burke, Cheryl Amundsen, and the late great Peter Kennedy of SFU for their comments.

Peer review could improve teaching assessment

Survey on teaching rewards

Cancel reply