A few years ago, one of my former graduate professors casually suggested, “Let the children grade the teachers. It’s just as reliable as any out there now if not more.”
Intellectually, it made no sense: students, especially young ones, can’t possibly know what makes an effective teacher. Their developing minds are not mature enough and are often mercurial.
But the doc was right, and it’s backed by science. Studies consistently demonstrate the validity of student evaluations over time and across raters. Young students in primary levels know a good teacher (as opposed to a nice one) when they see one; in fact, people’s first impressions and perceptions are surprisingly accurate. Researchers at Harvard had college students view two-second silent clips of various teachers they had never met before and asked them to rate teacher effectiveness. Their results were then compared to traditional end-of-semester evaluations of students who had the same teacher for a full semester. Findings were remarkably similar, suggesting the primacy of evolutionary fight-or-flight instincts.
I know which teachers in my former schools were effective and most other teachers do too. Same with many involved parents. Which means principals, biases not withstanding, have an even tighter assessment of their staff, which takes us back to square one: What are we really accomplishing by scrutinizing teacher performance and their added value to the classroom?
Not much. It’s just a way to appear productive and scientific to a credulous public without anything substantial behind it while promoting a shadowy agenda. Parents crave concrete information about their child’s teacher, so it’s not a hard sell.
But these measures need to be both scientifically reliable and valid, especially for high-stakes use.
RELIABLE measures are about consistency and precision of answers and VALID measures tell us what we want to know (is this teacher effective?). For example, height is a reliable measurement–it is precise and consistent (e.g., those who are six-feet tall are the same height everywhere), but it is not a valid measurement of basketball prowess (not everyone who is six feet tall is good at it).
Proposed value-added (VA) evaluations tied to student scores are NEITHER valid nor reliable. It is not precise or consistent enough (observation ratings can vary widely between lessons and observers, as the recent Gates Measures of Effective Teaching study and TC study found), and it does not tell us what we want to know: do flat student scores mean the teacher is ineffective? What do varying levels of test score gain/loss in one class say about one teacher?
Don’t get me wrong, I am all for developing fair and useful measures of effective teaching, but not for high-stakes use. They’re better suited for guiding teacher development and improving the profession–i.e., training. Have you ever noticed that teachers hate them? It’s not because they’re afraid of accountability, but because VA assessments are incredibly capricious. Less than 10 percentage points separates those in the 75th quartile and those in the 25th quartile, according to Steve Cantrell of the MET study. At the same time, the effects of one teacher from year to year can be large, which only reinforce the unreliability argument.
Researcher Matthew Di Carlo of the Shanker Institute does a great job breaking down what makes an evaluation valid and reliable, so I won’t go there. The real benefit for children is to focus more on improving the profession–on TEACHING–and less on TEACHERS. This perspective implies upfront investment in recruiting, developing, and supporting teachers as is done in countries like Finland, Singapore, and South Korea, not on back-end performance evaluations for individual teachers. Leave those to the students. They are usually right.