Note: the original version of this post included a section on individual teacher ratings. I could not find public references for the statements I made, and have deleted these two paragraphs. My apologies!
I recently placed a copy of How to Lie with Statistics in a little free library near campus. Not because I want people to be more deceitful – if you don’t understand how to trick others, then you yourself will be easy game. Numbers sound like facts. They can be used for malicious ends.
Consider medical ratings. These are ostensibly beneficial – prospective patients get to learn how well-trained their doctors are!
Saurabh Jha wrote an excellent essay explaining why these rankings are misleading, “When a Bad Surgeon Is the One You Want.” In brief, doctors who take easy cases will improve their ratings – their patients are more likely to have good outcomes. When doctors are assessed on their patients’ outcomes, then the doctors who take hard cases will appear to be incompetent. Even if they are much better at their craft than others.
The same phenomenon holds in teaching. Schools and teachers are often evaluated based on their students’ performance, without normalizing for the unique challenges faced by different populations of kids.
This week, the Indiana Department of Education released federal evaluations of local schools.
The elementary school located amidst our town’s most expensive houses, at which the lowest percentage of students receive free or reduced-price lunch, was rated as “exceeding expectations.”
The elementary schools that serve our town’s most disadvantaged students – one of which holds bilingual classes in English and American Sign Language to support deaf children, and has 86% of students receiving free or reduced-priced lunch – were rated as not meeting expectations.
My spouse and I are sending our own children to one of the schools that was rated as not meeting expectations. We know a fair bit about education – among other things, my spouse is the editor-in-chief of a national journal of teacher writing. I’ve observed classrooms in this low-rated school, and they are excellent.
But teacher morale is low, because the teachers are continually evaluated as being sub-par, despite the fact that they have chosen to work harder than others. Our school district is mandating that teachers in the low-rated schools spend time on unfulfilling test-prep regimes, even though these practices are known to further alienate under-resourced students.
Our nation’s school administrators ought to read How to Lie with Statistics, it seems. They’ve looked at a set of numbers and allowed themselves to be misled. Which bodes ill for the learners in their care.
I loved standardized test days when I was in school. Instead of sitting in class being lectured at until the teacher noticed I was doodling again and booted me to the office, we’d all sit in the cafeteria, spend maybe ten minutes filling in bubbles, then get to doodle in peace.
The tests themselves were dull, but my friends and I enlivened them with our “points per minute” game. By jotting down the time we finished every section, we could compare ppm scores for bragging rights even when everyone flat-lined at the same perfect score. And all those freshly sharpened #2 pencils balanced out the funky smell of the cafeteria carpet.
But I understand that, for students less recalcitrant than I was, the ones who might actually learn something during regular instructional days, standardized tests waste time. And the current barrage of tests don’t even fulfill their purported goal.
You want to pay good teachers more for doing their jobs well? Great idea! Most schools currently use a payscale that only rewards teachers for the total number of years they’ve been in the business. This is a cause of several problems, like older teachers having trouble finding new jobs because their salaries would be too high, and talented young people not wanting to go into education because their starting salaries would be so low.
Unfortunately, many otherwise reasonable people latched onto the mistaken idea that you could measure each teacher’s “value added” with a whole boatload of standardized tests. This makes school worse for basically every kid who isn’t like me, since they’re stuck taking too many pointless tests instead of learning. And, worse, the metric doesn’t even work.
I’m not sure everyone involved in this discussion even understands what “value added” means. Here’s a quick definition: let’s say you have a product that’s worth 100 dollars. Then you change it in some way. If the product is now worth 110 dollars, you’ve added 10 dollars of value to it.
Simple enough, right? If you’ve ever watched one of those cheesy TV shows about flipping houses, you’re probably an expert.
The example I like to start with is shipping. Apples at an apple orchard might be worth two dollars a pound. Anybody who wants an apple has to go to all the trouble of driving there. But if someone loads them into a truck and brings them to a grocery store near people’s houses, the apples might be worth three dollars a pound. Transporting apples from where they grow to where people eat adds value.
Another example is assembly. Most companies that sell computers don’t manufacture their own components — maybe you’ve been stuck at a coffee shop with some hipster dude explaining that your Macintosh computer is full of Chinese parts that Apple raises the prices of. But that’s a valid business model. They buy pieces and put them together into a functional device. Of course they charge more for the resulting computer than the aggregate cost of the components. They’ve added value by assembling it, making it so that even relatively clueless people can buy a computer and know that it’ll work.
So, teaching? A teacher has a set of students, and the hope is that these students change during the year. They might gain factual knowledge, or critical thinking skills, or the ability to work with others, or the ability to sit quietly in uncomfortable chairs and follow directions like mindless drones.
That list is a good segue into the first problem with the way people talk about “value added” for teacher pay — the idea doesn’t mean much until you specify what, exactly, you value. What’s the purpose of public education? By attempting to measure “value added” with a standardized test, you’re asserting that we send kids to school to improve performance on standardized tests.
Given how infrequently most adults take standardized tests in their day-to-day lives, I imagine this isn’t what most people think the purpose of school should be.
If we don’t care about how well kids learn to fill in bubbles nice and dark with a #2 pencil, then what should we value? Well, we might care about workforce productivity, in which case your “value added” metric should track students’ eventual salaries or lifetime earnings. Maybe we want to make people into better citizens, in which case we should measure how often people volunteer, or how often they vote, or what percent of students stay out of jail. Maybe we care about something as ethereal and hippy-dippy as happiness, in which case we could use surveys to assess well-being, or look at how many former students are married, or track how many commit suicide.
Or course, most of the metrics I’ve suggested can’t be measured immediately. With a bubble test, you zip ‘em through the scantron and five minutes later know how well everybody did. With happiness, or eventual salary, teachers would have to wait several years to know the whole amount of any “value added” bonus to their salary. To my mind, that’s fine — I think more industries should use long-term performance rather than short-term gains to assess bonuses — but maybe that seems weird to you.
Those long-term metrics should also hint at the fact that “value added” calculations would be incredibly complex. If you’re looking at somebody’s eventual salary, how do you know whether it was great work on the part of their third grade teacher, or their fifth, or their seventh, or their twelfth that gave them the skills they’d need?
It’s not an impossible math problem. Just tricky. This kind of multivariate regression isn’t feasible except when churned through by computers.
But I think it’s good that the math is so clearly difficult. Because the idea that you could assess “value added” with a standardized test given to students at the beginning and the end of the school year is bizarre. Among other problems, the “test at the beginning, test at the end, calculate the gains” idea ignores differences between students.
A student with learning disabilities will probably gain less than average each year, independent of teacher quality. A gifted student will probably gain more than average, again independent of teacher quality. The teachers do matter, of course. If you gave both Pablo Picasso and me some crayons and a piece of construction paper, his drawing would probably be better than mine. But he’d add less value to that piece of construction paper than he would’ve been able to add to a canvas, if you instead gave him a canvas and some oil paints.
A meaningful “value added” metric for teaching would ask, “How much did this student gain, compared to what he or she would’ve gained if taught by an average teacher instead of this particular teacher?”
Again, I want to stress that this is a very complicated math problem. But not impossible, as long as you have a population of many teachers and many students to obtain data from. You’d need to find some criteria to match students to one another. That way you can say, “This type of student usually gains this much during third grade when given an average-quality teacher.”
One difficulty in sorting people this way is determining what matters. What attributes define a student’s type? Do you include parental income as a variable? A near-meaningless childhood IQ test? Do you sequence every student’s genome and include genetic factors (Good Lord I hope not — even including ethnicity seems politically suspect — but that’s the sort of thing you’d want to consider)?
Your data would also be best if each teacher had a range of student types. This is very different from how most classes are currently organized. When I was in school, all the special education students were tracked together and had one set of teachers, all the “gifted & talented” students were tracked together and had a different set of teachers.
Tracking would make an accurate calculation of “value added” more difficult. Still not impossible, but less statistically robust.
Maybe that’s fine — it’s reasonable to assume that there are some teachers who’re good at working with gifted students, and can help them gain a lot, who might flounder if they worked with special education students. I think the reverse is less likely to be true — because special education is harder, I bet most teachers who are good with special education students could do well by other students, too.
With a real “value added” measurement, I think you’d see that. But if the powers that be cling to the mistaken notion that you can assess “value added” by measuring a difference in test scores between the beginning and end of the year, without considering that each student is unique, you’re instead going to conclude that all special education teachers are terrible. Their students gain less!
You’ll guarantee that those teachers doing the hardest work are rewarded least.
Whoops.
As it happens, this exact same misconception about “value added” is making medicine worse, too. If you’ve had your full dose of feeling dismal about what we’re doing to education, you should take a few minutes and read Saurabh Jha’s lovely post about this problem in medicine, “When a bad surgeon is the one you want.”