I’ve been a university professor since 1991. Being a university professor is a very weird job, because your performance is frequently measured in ways that don’t necessarily quantify whether you’re doing your job well. This job essentially had three responsibilities: (1) teach effective classes in your discipline, (2) do world-class research that contributes new knowledge and insights to your discipline, and (3) help run the institution at which you are employed and help run the professional components of your scholarly discipline. When I was being considered for promotions and every year when the size of my merit raise is being determined, my performances in these three areas (teaching, research, service) was supposedly what determined my success.
At most universities, only research output is valued. Dartmouth is different and quite possibly unique: I’ve written before about what it takes to get tenure at Dartmouth, where research and teaching are weighted equally in that evaluation, and your research is held to the standards of the best research universities, and your teaching is held to the standards of the best undergraduate teaching colleges. However, the way my teaching effectiveness is evaluated is one of the primary causes of grade inflation. In this post, I will explore how we might change the incentives of faculty to reduce the causes and consequences of grade inflation and thereby provide better educations to our students.
The rationale presented here summarizes the conclusions made by the committee I was on that considered the causes, consequences and solutions to grade inflation. You can see a description of our proposal at dartblog.com (click here for the pdf of our full proposal).
My superiors continually scrutinize my scholarly work to measure my research productivity in order to determine raises and promotions. Each time I was up for promotion, they wrote to 10 internationally renowned scientists in my field to evaluate papers I had written, to gauge my reputation as a scientist, and to expound on my level of research output. These letters provided direct statements about the substance and quality of my research. (I’ve written a lot of these letters about other people and read a lot of these letters in evaluations of other faculty, so I know what I’m talking about.) This evaluation can be sidetracked on issues of quantity and not quality (e.g., number of papers published regardless of whether any of them were any good), but for the most part, I think this does a good job of evaluating research quality.
However, in evaluating my teaching for promotion and tenure, the metrics that were used basically evaluated whether the students liked me. When someone is being considered for promotion to Associate Professor with tenure, Dartmouth invites 65 students who took classes from the faculty member to write evaluation letters of their experiences. These letters are uniformly thoughtful. However, faculty reading these letters to make departmental recommendations or to make the ultimate decisions on the institution’s promotions committee (at Dartmouth this is called the Committee Advisory to the President) read mainly to see whether the student liked the class, and little more. Typically little discussion occurs about whether the students’ comments demonstrate that the course was rigorous or challenging or useful. (I have served on the CAP, and I have participated in many departmental meetings–including chairing many–to consider promotion and tenure cases both here and at other institutions.) The other metrics that are used in these deliberations are essentially (1) do students enroll in classes that the faculty member offers, and (2) do students state that they liked the class and thought the class was useful to their education on the standardized evaluation forms that students complete at the end of the term. All of these are essentially measures of popularity. The way these student letters are typically read and the way these metrics are typically evaluated do not quantify whether the faculty member is actually an effective educator, how rigorous the courses are, or what standards the faculty member holds students to in those courses.
As a result, faculty do all kinds of things to increase their popularity with students, and these things either directly or indirectly decrease the level of academic rigor of their courses. Grade inflation is the prime symptom of this erosion of academic rigor caused by the popularity contest in which faculty are engaged. The popularity contest takes many forms:
I need lots of people to enroll in my classes. A very easy way to increase enrollment in a class is to develop the reputation that you are an easy grader or that the class is easy. For example, students at Dartmouth circulate something known as the “layup list“, which is a list of classes where an high grade is assured for little or no work. Not surprisingly, many of these have the highest enrollments at Dartmouth.
In fact, many departments knowingly or unknowingly put pressure on faculty to increase their enrollments. This desire flows from the incentives to garner more resources for the department; resources follow butts in the seats. The ways most faculty try to accomplish this is to decrease the academic rigor of their classes by either dropping harder material or grading more leniently. As a consequence, departments and administrators are directly forcing their faculty to decrease the rigor of their courses.
This is a very perverse incentive, because in the grand scheme, departments have very little ability to change enrollment patterns across disciplines. In our analysis of enrollments and majors, our committee compared numbers of majors and numbers of students taking classes in all departments and programs since 1990. We compared these numbers for Dartmouth students to national data on the number of Bachelor’s degrees awarded in the United States at the National Science Foundation‘s WebCASPAR Population of Institutions database. The results of this analysis showed that every academic unit at Dartmouth except one (Economics) closely follows national trends in enrollments (you can see representative results of this analysis in our report). For example, Computer Science enrollments both at Dartmouth and nationally closely follow the fortunes of the tech industry up and down, whereas English enrollments have been slowly declining nationally and at Dartmouth since 1990.
Certainly, individual faculty members can make their courses really popular with students by decreasing their academic rigor and grading very leniently, but changing a course to increase enrollments while maintaining rigor is nearly impossible.
Moreover, abundant data from many different sources show that student work less in courses where the grades are higher. Thus, giving higher grades is a direct path to decreasing student effort and academic rigor in classes.
I’ll get better student evaluations if I give high grades to everyone. A substantial academic literature exists linking student evaluations to grade inflation. This linkage exists because of the way student evaluations are read by promotion & tenure committees and by administrators determining merit raises for faculty. Currently, what a faculty member needs are a pile of adoring student evaluations with no dissenters among them. These evaluations are not read to parse the rigor of the courses or the standards to which students are held.
So let’s think for a minute about what student evaluations should look like in an academically rigorous class. In a rigorous class, some students should love it, some students should relish the challenge that the course presented to them, and some students might hate it because they did not relish that challenge. Students who are looking for that rigor will value being pushed to excel and being academically challenged as much as possible. However, some students will not appreciate or value being made to work. Thus, if a faculty member is teaching rigorous courses, one should expect to see among their student evaluations a broader range of opinions–including some decidedly negative opinions.
When I have served on evaluation and tenure committees, I have always wanted to see this range of opinions among the evaluations, and I have always been skeptical of people who receive uniformly high ratings from students. To me, the range and variation in student evaluations is more important than the mean score. Some students should be uncomfortable in your class if you’re doing your job correctly, which means that every faculty member should get some bad reviews. But if the instructor is doing their job correctly, they should also receive many glowing reviews about how challenged the student felt and how rewarding that challenge was to the students education. This means that the statistics of students evaluations for a rigorous course should have (1) a range that extends from top scores to low scores, which will cause (2) a large variance in scores and (3) a moderately high (but my no means perfect) average evaluation score.
It is very easy to read student reviews to evaluate the rigor and academic standards of a class, if students are asked the right questions and if the answers are read correctly. The worst reading of student evaluations is simply as a measure whether the students “liked” the course and the instructor.
I don’t want to interact with students complaining about their grades. Years ago, I was in a meeting of all the department chairs at Dartmouth (I was the Chair of the Biological Sciences Department at the time) in which we were discussing grade inflation. A very distinguished member of the faculty confessed that he gave low grades when he first came to Dartmouth, but he didn’t anymore because students come to his office to question why they received a low grade. Students actually came to his office to figure out why they got a poor grade on an assignment. Go figure. This person’s solution to this “problem” was to simply stop giving low grades because he didn’t want to be bothered to have to explain and justify his reasoning.
My response to him then and my response now is simply that explaining to students why they did poorly on an assignment, a paper or an exam is fundamental to your job as an educator. If you can’t explain to a student what was substandard about their work and how they could improve their work, you should not be teaching.
Interacting with poorly performing students is the most important responsibility of an educator. The students who are doing well in a class don’t need your help. The students who are doing less well are the ones where your effort should be directed.
More importantly, if you do not give lower-performing students lower grades, how are these students to know that they are not doing as well as they could be? Giving grades appropriate to the level of mastery of the subject is the fundamental signal that students receive about their performance. If a student is getting an A or B in your class, they think they’re doing well.
More than one Dartmouth faculty member have told me,
Every student knows that if they get a B from me they flunked the class.
My response to them has invariably been that those students are not marking up their transcripts with “flunk” next to every B they receive. And what is the incentive for the student to work harder if the lowest grade they can possibly receive in a class is B (and at Dartmouth as elsewhere, some classes have A- as the lowest possible grade)?
I am never held accountable for the grades I give. I’m sure most people believe that a great scientific knowledge base rationalizes how grades are given, much thought is put into what grades actually mean, and great pains are taken to ensure that grades are allocated fairly and responsibly. Nothing could be farther from the truth.
Here’s how it works at Dartmouth, and it works this way everywhere else I’ve taught (I’ve taught courses at Michigan State University, University of Virginia, Bowling Green State University and Dartmouth). You teach your class and determine what the final grade is for each student. Then you either write these grades onto a form page, or you open up a webpage on the Registrar’s site to enter the grades. Once you have typed in the grade for each student, you sign the form and turn it in or click the submit button on the webpage, and you’re done. Moreover, unless a student or parent complains about a grade, you’ll never hear anything ever again about your grades. They never come up in any context ever again (unless a student or parent complains).
I am never called to account for the grades I give, I never have to justify the grades I give, and the only thing that determines what grades I give is my own conscience. This is true everywhere. That is, until someone complains, but nobody complains about receiving unwarranted high grades. Do you see how this causes grades to rise?
Changing Incentives To Favor Academically Rigorous Classes
Why is the quality of my research scrutinized, but the quality of my teaching is not? The quality of my teaching can be directly and unambiguously determined by (1) the rigor of the content I present to students in a class, (2) the methods I use to impart that content to the students, (3) the standards to which I hold students accountable for engaging with that content, (4) the methods I use to evaluate students performance relative to these standards, and (5) the grades I actually give to students based on their performance. The distribution of grades given in a class are clearly metrics of a number of these (see here, here, here, here, and here for more specific arguments). However, I am never held accountable for the rigor of my classes or the grades I give to students, and incentives abound to decrease the rigor of my classes.
Our proposal to the Dartmouth administration results from a very simple conclusion. The grades given at Dartmouth reflect either or both of these two problems:
We are giving our students higher grades than many of them deserve.
– OR –
Our courses are so non-rigorous that the majority of students can achieve excellent mastery with little effort.
Either is hugely detrimental to the education that our students receive (see the post links above), and grade inflation is a fundamental symptom of both. Moreover, the solution to these underlying causes is very simple:
Offer challenging courses and grade them according to the Dartmouth Scholarship Ratings.
If you’re not familiar with the Dartmouth Scholarship Ratings, these are the definitions of grades at Dartmouth. I’m pretty sure that just about everyone would agree that these are reasonable (and the generally agreed upon) definitions for grades. For example, a grade of A is defined as “excellent mastery” of the course material, and a grade of C is defined as “adequate mastery”. Last year, 58.7% of all grades given at Dartmouth in all undergraduate courses were A or A-. (Institutions of higher learning like Harvard and Yale passed the 60% threshold years ago – see here and here.) On its face, one has to conclude that if nearly 60% of students can develop “excellent mastery” in every course, that institution is not teaching rigorous courses. Alternatively, perhaps the courses are rigorous, but students are not being held accountable for their performance, with all the attendant problems that come along with that. Thus, we’re between a rock and a hard place here.
Our proposal to the Dartmouth administration has a very simple underlying premise, namely that faculty should be held accountable for teaching rigorous courses, just as they are held accountable for doing rigorous scholarship and research. As I explained above, the rigor of my courses are not part of the evaluation process of my teaching, and so I have no incentives–other than my own conscience–to maintain or increase their rigor. Moreover, many incentives (outlined above) work to make faculty decrease the rigor of their courses.
Our proposal also has a very simple rule for assigning grades to students.
All high-performing students should receive high grades, all intermediate-performing students should receive intermediate grades, and all low performing students should receive low grades.
The Princeton or Wellesley plans for combating the causes of grade inflation failed, because they treated the symptom and not the cause. Both instituted some form of a quota system on grades. With a quota, some substantial fraction of students will not be getting the grades they deserve, either too high or two low. If a course is not academically rigorous, some students should not be penalized because all of them can get an A. Besides, grade inflation is not caused by the students: it is caused by the faculty. The solution to this pernicious educational problem lies with changing the behavior of the faculty, both in increasing academic rigor and holding them accountable for that rigor.
Thus, our proposal is all about changing the incentives motivating faculty and entire departments in the ways they teach and what it means to assign grades. You can read the full report here, so I will only summarize the major points of our proposal. Each addresses a specific issue for faculty that either removes an incentive to decrease rigor or applies an incentive to increase rigor.
1. Remove course grading gimmicks and drop/add extensions that favor students not fully committing to their courses. Dartmouth, like most places, has a number of policies that are meant to motivate students to take “risks” in their education–to take courses outside their comfort zone. However, almost no student uses these for their intended purposes. The two most egregious policies at Dartmouth are the Non-Recording Option for grading and the Withdrawal. The Non-Recording Option was instituted in 1967 presumably to attempt to curb the grade inflation that was apparent even then. A student electing this option for a course stipulates what grade they would like to achieve before the beginning of the term. If the student meets or exceeds that stipulated grade, they receive that grade, but if they do not achieve the stipulated grade while passing the course, they receive an NR (if they failed the course, they receive a failing grade of E). Data on self-reported time spent outside of class on classwork shows that Dartmouth students spend by far the least amount of time on their coursework if they NRO a class (you can see the data here). Dartmouth students can also Withdraw (called a W) from a class until 10 days before the last day of class. If the student withdraws after the normal drop-add period (10 days from the start of the class) but before this final deadline (10 days before the final class), a W appears on their transcript for this class to signify that they withdrew late. Both of these options incentivize students to only dabble in these courses and to not fully commit to them. Why work hard if I can always drop at the last minute (the W option)? Or if you’re stupid enough to give me an A- for little to no work (the NR option), I’ll take it.
2. Decisions about allocating resources to academic units should be based solely on intellectual and educational merit. Enrollment numbers, either numbers of majors or numbers of students in courses, should only be considered in such decisions if the units have increasing enrollments and can make a compelling educational case for how those resources will be used. Since enrollment patterns mainly track national trends, it is unreasonable to expect faculty or departments to boost their enrollments against these trends, because the only ways they can achieve these gains would be to decrease rigor. Thus, we propose that enrollments cannot be part of any decisions about resource allocatoin for departments with historically flat or declining enrollments. This completely removes the incentive for departments to pressure individual faculty to teach less rigorous courses on the illusory assumption that this will attract more students to the department.
3. Faculty should not be penalized for teaching courses with low enrollments. Currently at Dartmouth, a class with 5 or fewer students enrolled will be cancelled by the administration, and the faculty member teaching that course will have to teach an extra course the next year. Many reasons exist for why some courses have low enrollments, and many are perfectly legitimate. For example, some departments (e.g., languages) have few majors, and so courses in the major have a very small student population to draw from anyway, and yet these courses must be taught for those majors. Other courses that have moderate enrollments (e.g., 15-20 students on average) will have the occasional year where few students enroll simply because of the stochastic nature of enrollment patterns. Obviously, the administration should work to make the teaching loads across faculty as equitable as possible: faculty that teach low enrollment courses should also teach in large enrollment courses. However, we heard innumerable stories about how faculty make their courses less rigorous in order to not have their courses cancelled and be forced to give up research time to make up for the cancelled class. This incentive should simply be removed.
4. Each department should have to demonstrate the rigor of their courses, explain the standards to which they hold students in their classes, and justify the grade distributions they give based on those standards. Surprisingly, faculty within most departments never discuss what grades in their courses mean or how they determine what grades to give. This point of the proposal is meant to motivate these conversations among faculty within departments, and then justify to their peer departments what their standards are and whether they hold students to those standards. The simple act of talking about grading practices, grading standards and grading assignments will go a long way to restoring grades to a logical distribution.
5. Individual faculty should be held accountable for the rigor of their courses, the standards to which they hold students in their classes and justify the grade distributions they give based on those standards. A key component of our proposal is making faculty articulate the standards to which they hold students in their classes, and then evaluating whether they actually hold students to those standards based on the grade distributions they give. These considerations should be involved in all evaluative decisions about a faculty member’s teaching performance: promotion and tenure decisions, merit raises, consideration for honors.
Our proposal is to have faculty articulate what a student must do to attain each grade in a class (e.g., you will receive a grade of C for this level of performance, but a grade of A for this level of performance), and then evaluate whether they are holding students to those standards based on the grade distributions they give. The first issue to evaluate is whether the level of rigor and the standards for students are appropriate for the class. If they are appropriate, do the grades given in the class then reflect what should be expected from a group of people at their career stage for these standards. These statements of standards should also appear in the syllabi of courses so that students have a clear and unambiguous understanding of what is expected of them.
In addition, because student evaluations are also part of these proceedings, the appropriate level of expectations about the distribution of opinions in evaluations must also be read correctly. A broad range of opinions, including some negative evaluations, should be expected (as I described above). Student evaluations must not be read as a measure of popularity.
If your promotion, tenure and merit raises depend on you being able to clearly demonstrate that you offer rigorous courses and hold students to high and objective standards, you might just do it.
And if institutions and individuals would just do that, the grades will take care of themselves.