Marking gets an upgrade — University Affairs

It was a November day in 2011 when 15 boxes arrived at the Bahen Centre at the University of Toronto. They were stuffed with 5,000 16-page exams from the Canadian Open Mathematics Challenge, and James Colliander had to get them marked.

He ordered pizza and assembled 100 markers – a mix of faculty, postdocs, grads and undergrads mostly from the school’s mathematics department, where he’s a professor. Together, they put in a tedious eight-hour day on this volunteer task. The marking was repetitive, and since each grader would focus on just one section, there was a lot of waiting going on as the person working on section D had to wait for section C’s grader to finish up, and the like. There was a lot of flipping of pages and moving booklets around. “It was a logistical nightmare,” recalls Dr. Colliander.

But that day triggered an idea: what if this kind of co-marking, which happens all the time for large exams in mathematics and the sciences, could be transferred online? No waiting for other sections of the paper to be done. The opportunity to cut and paste often-used comments. No red pens and pizzas.

So Dr. Colliander started working with mathematician and software developer Martin Muñoz to develop Crowdmark, an online software platform. With the help of MaRS Innovation, an agency that facilitates the commercialization of university research, the two formed a company. In November 2012, Dr. Colliander orchestrated the marking of that same exam, recruiting 150 markers from across Canada to log onto a beta version of Crowdmark. His new team pushed through the same number of tests in 340 person hours, compared with 700 the year before.

Now, Crowdmark is being used in educational institutions in 36 countries, including 25 universities. It’s allowing teams to take paper exams on just about any subject, scan them into a computer and mark them collaboratively. Bonus functions include allowing a course director to chart the accuracy of graders, fend off cheating and change marks quickly when a student successfully disputes a grade.

This new piece of software is just one innovation changing how academics mark. Changing ideas about learning, combined with technology, are making this significant – and often time-consuming, stressful and even boring – part of the job faster, fairer and better geared to learning.

Creating assignments and tests and marking them endures as one of the biggest challenges of the teaching side of academia. And it takes a lot of time. “There have been so many times I’ve missed out on being with my friends or family, particularly in the spring when it’s lovely outside, because I have thousands of papers to grade,” says Dr. Colliander.

For Anne McNeilly, associate professor of journalism at Ryerson University, the weekly assignments for her first-year students translate into endless marking. “I feel like I’m working all the time,” says Ms. McNeilly. She takes papers wherever she goes, marking at the doctor’s office or at the arena while her son skates. In the morning, she’ll polish off a few papers before leaving for work.

It takes Erin Aspenlieder about 20 minutes to mark an eight-to-10 page essay – just part of the work for which she’s paid a full-time salary. But Dr. Aspenlieder, educational developer with the office for open learning and educational support at the University of Guelph, is concerned about part-time faculty who juggle four or five courses and get little support. “That takes time away from doing things which might get you off the sessional treadmill, like research.”

And the time spent marking is often fraught. When the paper or exam gets handed back, there’s often a lineup during office hours. “It’s exhausting to defend your grades,” says Dr. Aspenlieder. “Especially when the reasons they give are so amazing: ‘Please give me this grade, I need to get into vet school.’” With class sizes on the rise, the challenges of marking demand solutions.

The latest developments in marking fall into two categories: technology based and pedagogically based innovations. On the technology side, advances over the years mean hundreds of paper-based, multiple-choice exams can be marked automatically now with little human intervention. That may be ideal for instructors in early-year survey courses, but it isn’t ideal for students.

“Multiple choice is a great idea if you want to test one’s knowledge of trivia and useless information,” says David DiBattista, professor emeritus in psychology at Brock University. Since the late 1990s he’s been researching this form of testing and has concluded that most professors write poor questions that don’t enable learning. He now spends his time consulting with textbook companies – he calls the test items in most textbooks “junky” – and doing presentations on how to avoid testing for trivia and get to higher-level thinking. (His tips on question creation include: avoid “all of the above” and “none of the above” as answers; write questions that are in a question format, not sentence completion; and avoid negatives in questions.)

Dr. DiBattista also shares his statistical analysis approach for weeding out poor-performing questions and advocates for professors putting more time into writing, reviewing and revising their tests. “We have an ethical obligation to do this work. If we are measuring our students’ performance and have no idea how well our tests are doing it’s entirely possible we are measuring very badly.”

But in the digital era, testing formats themselves may change. “Multiple choice testing is on its last legs,” asserts Mark Gierl, professor of educational psychology and holder of the Canada Research Chair in Educational Measurement at the University of Alberta. He researches and uses digital exams and thinks they will eventually become the norm; he says it will save schools millions of dollars, since dealing with paper eats up two-thirds of most institutions’ assessment budget.

Computers also allow for multi-part answers, drop-down menus, drawing or manipulating maps, videos or sound clips, graphs and other visuals. “A computer is a very powerful tool,” says Dr. Gierl. Duplicating a paper-based format on a computer “is kind of a waste of having it online in the first place,” he says. The software will mark the paper as the student writes it. (For more on a software program that specializes in math and sciences, see “Maple T.A.’s web-based testing for math courses” at the end of the article.)

Algorithms can mark just about anything, and well. To test the effectiveness of the array of computer-based testing platforms on the market, in 2012 the Hewlett Foundation staged the Automated Student Assessment Prize, offering $60,000 to the product that could most reliably mark short-answer essays. The results: all nine software entries met or exceeded the marking accuracy of human graders. Dr. Gierl is not surprised. “This is just a sliver of what computers can do right now,” he says.

Such programs have been widely adopted by private organizations offering large-scale testing of English-language competency and by the company which provides Graduate Record Examinations, or the GRE. Universities, on the other hand, have been much slower to bring computers on board to mark tests or long essays – although the data show they’re capable of both. “We prefer humans to score essays,” says Dr. Gierl.

Technology aside, the best new ideas in assessment are those that are falling in line with emerging pedagogical research. That includes a shift towards looking at outcomes as a more integral part of course design.

“It used to be we’d think about what content we’d want to cover. Now we’re asking, what do you want students to be able to demonstrate by the end of a course?” says Donna Ellis, director of the Centre for Teaching Excellence at the University of Waterloo. She runs a four-day course-redesign academy that’s driven by the idea of course outcomes and how they influence assignments. The program caused one faculty member, for instance, to realize that the final project and final exam for the course were assessing the same skill set.

Dr. Ellis has found the learning-objective approach makes creating rubrics and designing and weighting assignments much more straightforward. Beyond that, Dr. Ellis encourages assessments that don’t simply measure skills and knowledge, but are teaching tools themselves.

Meanwhile, studies are showing that students respond best to quick feedback. “The frequency of feedback is more important than getting high quality or individualized feedback,” says U of Guelph’s Dr. Aspenlieder.

The combination of these two ideas is leading to a rise in so-called formative assignments: smaller projects throughout the term that offer a stepping stone to later projects. Assignments such as quizzes on readings or writing essay outlines and bibliographies give students the tools to succeed at the summative assessments like final essays, presentations and exams.

Ms. McNeilly, the journalism professor at Ryerson, came across an approach called minimal marking, first proposed in the 1980s, as a way to separate language mechanics from style and content. She finds her students’ papers are often riddled with grammar, punctuation and other basic errors. “They’re really bright, capable students, but they never learned this stuff,” she says. “They don’t know what a subject or a verb is.”

She has been following research that shows children who know their multiplication tables do better at advanced mathematics and she feels the same is true of basic grammar: “When you learn the basics, only then do you have the scaffolding to express yourself more articulately.” Moreover, other studies, she says, show that people marking essays often don’t look past those distracting surface errors to assess the content of a paper.

She admits the name “minimal marking” is misleading – the technique actually takes longer than the usual way of correcting errors (for more details, see “Minimal marking: how it works” at the bottom of the article). But it has helped her separate content from errors and gets students motivated to finally bone up on their grammar.

Meanwhile, the desire to speed up marking is creating interest in student-peer assessment, which in theory can allow an instructor to offer students more regular feedback. “Students think peer review just makes the job easier for the instructor, but in fact it helps raise the bar. There’s some meta learning that goes on when you’re reading someone else’s paper,” says Jason Thompson, an instructional technology specialist at the University of Guelph who helps run an online peer-assessment tool called PEAR.

But peer work still takes time: instructors need to set up an online project (or draft a process for an in-person peer project), and they need to spot-check the peer assessments or even mark them once the project is done. Rigour at all stages of a peer-assessment process is a key factor, as there are huge potential pitfalls: students could be offering unhelpful feedback, some may disrespect the process and simply toss out comments or grades, and charges of unfairness could be raised during a grade challenge. Still, for all its drawbacks, this approach is grabbing hold. On a single day in early 2014, 760 assignments were in progress on Guelph’s PEAR system.

In a similar vein, University of Toronto Scarborough professor Steve Joordens and PhD student Dwayne Pare have developed a peer-grading software called peerScholar. The software distributes submitted student work to a number of peers, each of whom provides a grade as well as feedback. Dr. Joordens recently used the software successfully in a massive open online course he was teaching.

Showing similar promise is the idea of using peer work on exams. After experiencing a difficult class in 2009, Marion Caldecott started using rubrics. The limited-term assistant professor in the department of linguistics at Simon Fraser University also looked into the idea of group exams to make sure students truly comprehended her very technical material.

A grant from SFU allowed her to run a study last year on a third-year class. For weekly quizzes and two mid-term tests, students would take the short-answer test alone for 85 percent of the grade. Then Dr. Caldecott would hand out another copy of the same test and students would gather in groups of five and take another run at it, debating the answers and filling out the test as a group. These results counted for 15 percent.

“I’d overhear a lot of really good discussions and reasoning skills being demonstrated as they did the tests,” says Dr. Caldecott. Students weren’t thrilled with the idea at the outset but by the end of the term, surveys were overwhelmingly positive, and grades went up as well. It meant a few more papers for Dr. Caldecott and her marker to grade, and a bit of extra math on a spreadsheet before entering grades into the learning management system, but otherwise it was a total success. “I tell everyone I know about group exams. It’s so awesome.”

Maple T.A.’s web-based testing for math courses

Of the numerous computer-based testing products on the market, one of the most innovative comes from a company launched by University of Waterloo professors. Maple T.A. allows instructors to write questions directly onto a computer using math notation; the students answer in kind. It uses algorithms to mark the answers that account for variations in correct answers and different ways to notate that answer (for instance, x+y is the same as y+x). The software, which is used by an estimated 100,000 students around the world, can also offer so-called adaptive answers, where a student who’s getting everything wrong is dropped down to an easier set of questions. “You can use that for placement testing, or just for homework and practice testing,” says Paul DeMarco, director of development for Maplesoft, which makes the software.

Minimal marking: how it works

With minimal marking, the instructor simply uses some kind of notation (Ryerson’s Anne McNeilly uses small circles) at the end of each line of a paper if there is an error. Three errors on that line, three circles. Dr. McNeilly also fills out a rubric and gives the student a mark, but keeps that piece of paper to herself at first. She hands back the paper, the student then must figure out what the problems are and fix them to get the mark. Ms. McNeilly ran a pilot program in 2012 and 2013 using four first-year reporting classes, two as controls, to test the effectiveness of minimal marking. “They couldn’t figure out what the problem was,” she says. So on top of about 15 minutes of marking time per paper, Ms. McNeilly was fielding dozens of writing mechanics questions during office hours. But that workload diminished by the end of term as students began to master grammar and punctuation. Her study, in the end, showed the minimally marked students got three times as many A’s and B’s as those in the control groups.

COMMENTS

University Affairs moderates all comments according to the following guidelines. If approved, comments generally appear within one business day. We may republish particularly insightful remarks in our print edition or elsewhere.

Cancel reply

3 Comments

James Colliander / October 8, 2014 at 10:05

Thanks for the article! Just a small correction: there were 5,000 16-page exams evaluated at the University of Toronto during the 2011 Canadian Open Mathematics Challenge.

David Calverley / October 17, 2014 at 09:00

An interesting article. Much of what is innovative at the post-secondary level has been part of the shift in assessment at the elementary and secondary level for the last 15 years: rubrics, exemplars, telling students what they need to do in an assignment (success criteria), formative assignments, and chunking larger assignments. However, it is good that different assessment strategies are filtering to the universities. The profs/instructors in this article should be commended.

Two problems exist at the post-secondary level: excessive class size and lack of training for TAs/profs. When class sizes were small, professors had more time to work with students. Now, many classes are excessively large and the grading load is correspondingly heavy. However, much of this could be handled by taking the time to train TAs to assess effectively. They do much of the grading in first (and sometimes second) year courses. It doesn’t strike me as unreasonable to expect a TA to take a 1 week (25-30 hour) course on assessment that is particular to their subject discipline. I would recommend similar training for professors and instructors. Grading is a big part of the job, but it is taught to very few profs in a structured manner.

Arun Mukherjee / October 30, 2014 at 09:00

As class sizes have become larger, the functions of lecturing/teaching and grading have been uncoupled and grading farmed out to cheaper markers. These markers may not even have read the course content. Marking, thus, is being treated as a mechanical activity. In my case, marking is often followed by a one on one meeting with the student who is unhappy with his/her grade. Learning about their problems so they can do better next time is an important aspect of pedagogy. Some times a rewrite after the meeting is the best outcome for a student who wants to practice and improve. These rewriters have often thanked me profusely, saying that they had never been told about their writing skills before.