Nobody Asked Me: Grading Better

A few days before the 2020-21 school year began, leadership dictated to Newton teachers that we would be using a new, never-before-seen (by us, at least) grading system. It was difficult to view such a sudden, last-minute mandate positively given that it was only one of a barrage of abrupt policy changes that did not incorporate, or even ask for, teacher input. However, it did establish that grading systems in Newton are not sacrosanct. What would the ideal grading system look like?

What Can (Not Should!) Grades Be Used For?

• Provide feedback to students, families, and other teachers

• Motivate student learning

• Motivate student behavior

• Signaling intelligence to other members of the community

• Signaling “being a good student/person”

• Sorting students/signaling to colleges

What Should Grades Be Used For?

• Provide feedback to students, families, and other teachers

Our society has come to view grades, rather than learning, as the primary tangible that students take away from schooling. Grades are what get students scholarships. Grades are what get students into a “good” college. Grades are what students are praised for. High grades are desirable, naturally lending themselves to purposes for which they’re not designed.

The grade has become an all-encompassing symbol of how “good” a student is. If a person is told that a student has straight A’s, that person will likely make a number of assumptions about the student; if a person is told that a student has F’s, another set of assumptions will be made. Grades can then be used as incentives by teachers and parents to motivate desirable behaviors entirely unconnected to how well a student is learning.

Under the traditional A-F grading system, a single letter has come to measure and represent an enormous quantity of unconnected “qualities” in a student. The more characteristics a grade claims to measure, the worse it becomes at measuring any individual characteristic. We’ve arrived at a point where the grade that purports to measure everything, in practice, measures very little.

The “Traditional” A-F System

The appeal of the traditional grading system is readily apparent: it is easy to understand. It is the system that most people grew up with. And, perhaps most insidious, it claims to generate final percentage grades with several-decimal accuracy.

Math, even when it’s garbage, lends legitimacy to systems. The moment someone can point to a calculation that was used to arrive at a decision, they gain credibility among the general audience. This is particularly true in the field of education, with its emphasis on social-emotional learning, student wellness, and other “soft” goals. The primary tool used to arrive at a final grade is the weighted average. For example, perhaps in a math class, homework is worth 10% of the final grade, quizzes are 20%, projects are 30%, and tests are 40%.

But let’s back up; didn’t we want grades to generate useful feedback to students and families about what students are learning? What are skills/content that a math teacher might want their students to take away from their class?

• Exercise-Solving: can use new information to answer well-defined questions with well-defined processes for answering which are similar to previously encountered questions

• Problem-Solving: can think critically and synthesize information from multiple content areas to solve problems that are unlike situations previously encountered

• Conceptual Understanding: understands what the math they’re doing represents and why

• Mechanics: can manipulate mathematical objects without error

• Organization: can present solutions in a clear, well-organized way such that a reader unfamiliar with the problem can follow what they’re doing

Chances are, each homework assignment, quiz, project, and test, had some combination of these five skills. Maybe some had more of one than another, maybe some didn’t have one of the five at all. But under the traditional grading system, each assignment generates an average which does not differentiate among the skills demonstrated. Worse still, each average is then averaged with other averages to generate a final average that has lost any ability to communicate what a student has done well or not.

Consider a student who earns a B. They might earn a B because they’re really good at exercise-solving, conceptual understanding, mechanics, and organization, but haven’t yet developed their problem-solving skills whatsoever. Another student with the same B might instead have demonstrated about 80% mastery of all five areas. Look at ten different students’ B’s, and each of them can represent something different.

How does this communicate any useful information to students and families? Only the extremes (100% mastery in all areas, 0% mastery in all areas) are able to be interpreted. Everything else is ambiguous and meaningless, but masquerades as infallible because, well, it’s a number that was obtained with other numbers.

This ambiguity in what a letter grade means allows individuals to assign their own meaning to it. Because it’s a vague average, higher values come to be associated with good things, and lower values come to be associated with bad things – even things that were never measured in the grade!

If the traditional grading system is spitting out meaningless symbols stitched together from disparate parts, what are the other options?

A/B/P/NG

Assuming that the goal of a grading system is to communicate feedback to students and families, the grading system that Newton adopted this year was even worse than the traditional system.

Even if teachers had been given enough time to try to work through what this is supposed to mean (which we weren’t), I suspect it wouldn’t have mattered. Let’s go through some of the major problems with this system before jumping to some solutions.

• I’ve never seen such a vague set of criteria in my life. What is the difference between most and many? Is most greater than fifty percent? If so, would less than fifty percent be “many?”

• Especially being entirely remote for the first chunk of the year, it seems incredibly optimistic to believe that teachers will be able to accurately measure how reflective a student is on their own growth. Even in-person, this is vague and unclear.

• The conditions listed (not defined as 100%-90%, etc) were actually weaker than the conditions verbalized, which were that teachers should not calculate a number and convert that into a letter grade. For subject areas like math, students demonstrate mastery of skills and learning expectations by doing them. Decoupling measured numerical performance from grade is nonsensical.

• Under this system, if a student turns in work, and a teacher can clearly determine that they understand absolutely nothing, that student receives a P, as NG’s can only be given when a student has not submitted work. This is incredibly detrimental to students who have not yet mastered the content and would benefit from repeating the course or otherwise accessing supports. By passing this subset of students into the next class in the sequence, we are placing them inappropriately and setting them up for a terrible experience.

• It was a bizarre decision to continue to call the A and B boxes “A” and “B” when, as we were reminded often, they bore no connection to the A and B of the traditional system. Conveniently, when the grading committee compared Newton’s performance to surrounding districts, they felt quite comfortable comparing our not-traditional-A’s-and-B’s to other district’s traditional A’s and B’s.

In a year when so much time and effort was spent on anti-racism and preventing teacher biases from impacting student outcomes, it is confusing that the administration introduced a grading system that encourages teacher bias, since each teacher was forced to largely interpret this system on their own. Students did not understand where their grades came from and teachers didn’t understand how they were supposed to generate grades.

If the primary goal of grading is to provide feedback, this system utterly failed, as it generated either no useful feedback or misleading feedback from which students drew incorrect conclusions.

The Solution – Mastery Grading

Let us return to the aforementioned case of a math teacher who is looking for their students to master five skill areas. Traditional grading makes a mess of everything by averaging apples, oranges, and melons, and calling it a B. This B, being vacuous in meaning, can then be asserted to mean all kinds of things that it doesn’t mean. It can also be misused as an incentive.

The solution, then, is simple: don’t average the individual categories.

Suppose the math teacher gives students an assessment. Instead of setting up a problem like this,

1. Find the equation of the plane that contains the points (1, 2, 3), (1, 3, 5), and (2, 4, 6).

(12 points)

the problem would be set up like this:

1. Find the equation of the plane that contains the points (1, 2, 3), (1, 3, 5), and (2, 4, 6).

Exercise Solving: 4 points

Mechanics: 4 points

Organization: 4 points

Then, at the end of the assessment, instead of a big “87” written at the top of the page, the student would see five scores: Organization: 12/20 Mechanics: 16/20, etc. which would be obtained by summing the individual instances of each category on each problem.

The power of this type of grading is substantial. On individual assessments, it gives students far more targeted feedback on where they’re doing well and where they need to put in some more work. When students come to go over a test with a teacher, they’re typically looking for more than that. They want the teacher to identify the specific areas that they’re struggling in and what steps they should take to improve – essentially, to break the “87” back into the subcategories.

This system also tracks student growth more effectively than the traditional system. Under the traditional system, a student could improve their problem-solving skills all year, but that improvement is hidden because they bombed on a test or got a couple of zeros on missed assignments, both of which would dominate the average and lower it.

One of the most exciting elements of this system is that there is no one number that students can point to as their capital-g Grade. All the bad things that grades are used for, intentionally or incidentally, are gone. Instead of asking, “What do I need to do to get an A?” students can look at their grade and understand exactly what they need to do, not to get an “A,” but to master a specific skill area. The extrinsic motivation of grades is weakened; it sounds a lot less braggy/signaly (but far more meaningful!) to say, “I have demonstrated mastery of all five skill areas in my math class,” than it does to say, “I have A’s on all my math tests.”

Mastery grading, with its easy ability to track content/skill areas throughout the year, allows students to demonstrate growth. Naturally, this would de-emphasize earlier work in favor of more recent work. Students who struggle with problem solving in Quarter 1, but successfully learn and master it by Quarter 4 shouldn’t be punished for their poor marks in Q1, they should be rewarded for having learned the content! Mastery grading allows for this in an easier manner than the traditional model, with built in opportunities for students to demonstrate that they’ve mastered something that they hadn’t before.

Finally, the numbers would mean something. People would be unable to assign “good person” to A because, not only do A’s not exist, but the grading system itself spells out exactly what it is grading, and “good person/student/success, bad person/student/failure” is not on the list.

Problems with Mastery Grading

The primary problems with mastery grading are on the implementation side, but all can be overcome. Here is what needs to be done in order to smoothly integrate and implement mastery grading:

• Meaningful pilots must be conducted. Administrators should look for at least one volunteer in each department to try out mastery grading for the full school year, give the volunteers time to meet and discuss, and trust the teachers to develop a model that works best for their own departments and classrooms.

• Teachers need time to discuss, explore, and revise the model in order to find something that works for them individually and for their individual students. This does not mean a one-size-fits all approach. Nor does it mean that teachers are given one 30-minute meeting to “figure out mastery grading.”

• Course management systems like Schoology and Aspen must be either replaced or updated to allow for mastery-style grading that does not average disparate categories together.

• The energy barrier to teachers implementing this successfully must be as low as possible. This means that our “report card” grades should be automatically pushed out to families without teachers needing to re-enter hundreds of numbers into Aspen from Schoology.

• Teachers must be given time to work together to play with assessments and other mastery grading tools, updating traditional grading tools to mastery grading tools.

• For mastery grading to be done meaningfully, there should be no averaging of the different skill areas together in order to generate a GPA. It fundamentally doesn’t make sense to average five different skill areas in math together, get a number, and then average it with the average of five different skill areas in history, etc.

• Finally, this may sound counterintuitive, but mastery grading should not be mandated, at least for several years. If the model is successful, more teachers will naturally be inclined to use it, and it will spread. Especially in Newton, where trust between teachers and administrators has been destroyed this year, a mandate from on-high would likely do more to ensure a program’s failure than its success.

Another concern, from the teacher side, is whether mastery grading requires an unrealistic amount of time to do well. Pilots will provide useful evidence, but most likely, mastery grading will be like most other systemic overhauls – high upfront costs, with long-term costs on par with what we’re doing today, but with much larger benefits.

Parents and students may wonder how colleges would view a student’s mastery grading transcript as compared to a traditional transcript. This would be something for administrators and counselors to research, but it seems strange that a college would be upset that a high school gave them more information about a student. Seeing a transcript with a bunch of letters tells a far less compelling story than being able to track a student’s problem-solving skills from year to year and from class to class.

Grading is complex, but it doesn’t have to be. For too long, what was meant to be a feedback system has been warped into a toxic machine that extracts joy from learning, obfuscates what students are learning, and acts as a perverse external incentive. It’s time to make it work for teachers and students, instead of against everyone.

rynotalk

Search This Blog

Nobody Asked Me: Grading Better

Comments

Post a Comment

Popular posts from this blog

The Complete Guide to Teacher Negotiations: What Happened, and Where Do We Go Next?

Suspected Cheating at GP New Jersey

Stealing from Children: The Cruelty of Mayor Fuller