Let's Blow up Jupiter: Grading Schemes

I think most people would agree that a grade is (or is supposed to be) some sort of measure of knowledge, skill, and work. Depending on the particular class it could be more or less all a measure of knowledge or all a measure of skill or of work but usually it is a result of a mixture of all three to varying amounts.

But like all measurements a grade is an imprecise thing and a grade is more imprecise than most measurements. For one thing what exactly it is measuring is not exactly clear. But for the moment let us ignore the really difficult questions and simply allow ourselves to accept the fact that grades are (for the most part) assigned from the "percentage" that you have in the class. Looking at grading with these simplifications in mind a grading scheme is simply a partition of the interval [0, 1].

Before we discuss how you might decide on a grading scheme we should talk about the distribution of grade percentages. A grade percentage for an individual is a weighted arithmetic average of their scores on various exercises and homeworks etc throughout the course of the class. Despite what the distributions for individual performances of specific homeworks and tests etc in the limit of a large (very probably an unrealistically large) number of such tasks and assignments by the central limit theorem we can expect the distribution of the total grade to look roughly gaussian (the familiar bell curve).

Each individual students score should be thought of as merely an indicator of the nature of that students performance distribution. Here by performance distribution I mean the distribution of percentage grades that a million copies of the student would get if they were all to take the class. Obviously each copy would do differently in the class. There are many sources of variance here. For instance the teacher will sometimes pick material that a student has studied intensively to be the main subject of an exam and sometimes the student will study an area that will have almost no bearing on the exam questions. Since both tests are equally valid (or at least are conceivably equally valid) we must account for effects like these in our framework for considering grades. Also each student will have an individual variance meaning that a person will do differently on the same test given under the same conditions.

All of this doesn't really amount to much when we act under the assumption that the distribution of performance of a student is more or less the same across different tasks and exams. If that is the case then no matter what the standard deviation of the underlying distribution is for assignments and tests we can make the distribution for the total grade be arbitrarily narrow. Meaning that in the limit when we average scores over a large number of assignments and exams each student's total percentage grade will converge on their average.

For our first foray into grading schemes then we will make a further simplifying assumption that each person's score represents a distribution which is a delta function at their score. (a delta function is an idealized spike, a bump with arbitrarily small width but a non-vanishing area) So that each individual deterministically will get some characteristic score. This is actually the model that is put into use in reality. The assumption is made that either the spread of the performance distribution of an individual student is small compared to the size of the gaps between students scores, or that information about the spread of an individual's performance is either unobtainable or unusable, unimportant... etc.

The most straight forward way to make a grading scheme is simply to arbitrarily pick a minimum acceptable passing percentage and then dividing the remaining space up evenly into the grade levels. Or equally easily just pick arbitrary levels to correspond to different grades. The only problem (and of course some would say it is not a problem at all) with this sort of approach is that if you don't look at the scores at all when you set the levels for the different grades then two people whose class percentage differs by 0.01% can receive different grades. If you really believe that the intrinsic variance in the individual total grades is less than 0.01% then you can at least be confident that the grade difference represents an actual difference. However even when it does represent an actual difference the difference is so small that it is arguably unfair to give the student who is 0.005% above the dividing line the B grade and the student 0.005% below the line a B-.

If you are the sort of teacher who decides on percentage levels for grades before the percentage distribution of the class is known then the answers to this problem are simple. You monitor the distribution of the class as it goes along and then give extra credit and or easier/more assignments as necessary to guide it to something sensible. I view this as a quick fix tactic rather than an answer to the fundamental question. Less effective but more logical is the tactic of reviewing individual scores which are close to the border line and looking for reasons to bump the student over the line and give them the higher grade.

If on the other hand you are grading "on the curve" then you are free to take into account all the details of the distribution of grades that came out of a class. For small class sizes (say below 100) it may or may not make sense to grade on a curve but for large classes with high difficulty levels it becomes a necessity. It can be difficult to accurately gauge the difficulty of an exam or homework so a professor may give in a particular semester much harder tests on average or much easier tests overall. Under a preset grading scheme these differences of difficulty could not be taken into account and therefore the same level of performance in the class could result in two very different grades. More over one student taking the class when the harder exams were given might achieve a C while a less capable student taking the same class during an easier semester could achieve a higher grade. If the difficulty level of the class cannot be maintained relatively constant (as is the case in most physics courses) then it is best to "grade on the curve".

At this point the question arises "what is the fairest grading scheme?" For the moment lets ignore the variance/uncertainty in an individual's percentage. Furthermore let us first consider only relative performance of individuals and leave any absolute measure of performance out of the picture. We take it to be fair that a person's grade reflect their percentage. A higher percentage represents a better performance and therefore should correspond to a higher grade. From this perspective we are in a sense under rewarding those who are at the high end of a grade level and over rewarding those who are at the low end. In some way or other then we want to minimize the average difference between low and high percentage grades within one letter grade level. A very natural way to minimize this "unfairness" is to simply minimize the average difference between any two grades in a grade level. Going one step further we might minimize the sum of the squares of the differences of the scores because then we are dealing with the familiar object of standard deviation.

In my most recent actual grading session we in fact set rough places where the grade divisions should be and then went through the percentages looking for large gaps between scores where we could roughly put the line. Then we looked at the individual performances of people on each side of the line and considered moving the line up or down.

One problem with this gap method when you have 300 people in a class is that a "large" gap is on the order of half a percent. But if we were to apply the idea of looking to minimize the standard deviation of each grade group then a single grade point sticking in the middle of some others won't put you off.

I am not sure how well the above proposed grading scheme would actually work but I am eager to find out how it would work. Perhaps I shall propose it to the next professor under which I shall TA.

Let's Blow up Jupiter

Wednesday, December 23, 2009

Grading Schemes

1 comment:

Readers