Monday, August 8, 2016

My response to the NY Times article about the political uses of test scores

Update: See Sunday's NY Post, which unlike the other papers, actually gave some historical context for our skepticism.

On Friday and Saturday, some reporters started walking back their earlier stories that gave undue credence to the apparent test scores increases in NY state and NYC.  In the process, they made some of the same points I made in my blog on Thursday, that any claims of improved achievement were untenable, that this year's results cannot be compared to last year's, and that too many groups were making these claims for political reasons. However, in the process of writing their stories, these reporters caricatured my position in the debate.

First , please read my blog from Thursday if you haven't  already.  Then read Saturday's NY Times. Although the reporter Liz Harris seems now to agree that any claims of improved achievement are untenable, she makes several errors in the process.

1.  The  article identified me as leading "some groups opposed to the tests."  I don't lead any "anti-testing groups".  There are several NYC groups that might be characterized as such, but I lead none of them.  I  run Class Size Matters, which is dedicated to reducing class size in the NYC public schools and the nation as a whole, and I co-chair the national group the Parent Coalition for Student Privacy.  Moreover,  as I  explained to Liz,  I support testing, if the tests are well-designed, grade appropriate, stable and consistently scored, with no stakes attached.   That's why I paid such close  attention in my blog to the NAEPs, the national exams that feature reliably scaled results.

Why? The results of such tests are among the very few ways we can objectively track trends in student achievement. The use of such tests are also one of the reasons we know for sure that smaller classes lead to more learning and a narrowing of the achievement gap between racial, ethnic and economic groups.  Because I believe in the importance of testing to help diagnose whether a student is learning or if system-wide policies are working, I get especially angry when the results are distorted for political ends. 

As I also explained, because of repeated changes in the state tests and the scoring, NY hasn't produced tests that can reliably track learning or achievement trends since at least 2002.  Since then, we have had 14 years of wild swings, with huge test score inflation between 2002 and 2009, then sudden deflation, then another apparent drop in achievement when John King decided to impose new Common Core tests and set the proficiency levels to prove that two thirds of the state's students were failing.  This year,  to assuage the opposition of teachers and parents who said the tests were too long, too confusing and too stressful, and to counter the opt-out movement,  the Commissioner shortened the exams and implemented them untimed, which meant that this year's results could not be compared to last year's.  Yet the NY Times reporter insisted on characterizing me as "anti-testing," presumably to imply that I was politically motivated as well.

After the NY Times reporter told me they were going to focus on the politics rather than the accuracy of the state's claims, I wrote, "The politics are irrelevant to me.  We and others pointed out the test score inflation under Mills/Bloomberg/Klein and we're pointing it out now under Elia/deBlasio/Farina.  The more that things change, the more that they stay the same."

2. The reporter also wrote that I had claimed "the state had manipulated the underlying data to have more children pass."  This is untrue.   In my blog and in my conversation with her, I pointed to at least five reasons why reporters and members of the public should be skeptical of any claims of improved achievement in NYC and statewide.  These include: the state's  history of test score inflation, more recent NAEP trends which are in the opposite direction to the state score trends,  the fact that the state tests were shorter and given untimed this year, the fact that 95% of the state's districts had a  5% or more opt out rate, with many districts at nearly 50% opt out, while the city's participation rate was much higher, at more than 95%.  This makes not only any accurate judgements tenuous but also comparisons between the state's level of achievement to the city's especially suspect.  I also said that there was additional evidence that the state MAY have manipulated the results, as the percent of raw scores out of the total possible aligned with proficiency dropped in 11 out of 12 exams this year, while at the same time, the Commissioner stated that the tests were equally rigorous.   As I wrote in my blog, "We won’t know if the questions were harder or easier until the state releases the P-values and provides other technical details."

The state has the data on P-values now, which stands for probability values, and refers to the probability that a student would respond with the right answer to a question.  The higher the P-value, the easier a question is assumed to be, based on the responses to embedded field test questions from the year before. This would help demonstrate whether the "adjustments" SED made in dropping the number and percent of raw scores for each level compared to last year before were soundly based on statistics or arbitrarily drawn to boost apparent performance.  Until then we simply don't know.  Instead of merely quoting the state's critique of our analysis, reporters should demand more transparency from NYSED.  Indeed, the public deserves answers.

3.  Another weakness of the NY Times article was to omit the Commissioner from the list of those individuals and groups who were claiming that the data showed real improvements in learning.  Though the NYSED presentation did include the telling phrase “because of the changes made to the 2016 exam and testing environment, the 2016 test scores are not an ‘apples-to-apples’ comparison with previous years,” the Commissioner then proceeded to do just that, by showing chart after chart demonstrating big jumps in proficiency this year.   Here  are also some of her tweets on the subject:



While in the presentation, she said that "We cannot pinpoint exactly why the test [scores] increased,” she then variously attributed them to students in grades 3 and 4 having "received instruction in the Common Core since Kindergarten and 1st grade," and the fact that teachers had more experience teaching the standards.  In a radio interview, she said the bump in ELA scores was due to an intensified focus on literacy.

Elia's insistence in ignoring the disclaimer in her own presentation encouraged NYC reporters to recite the stats showing gains in proficiency, and to echo the claims of great improvement announced by the Mayor and Chancellor, while minimizing any mention of the dubious nature of these claims. In the case of the the NY Times, the "apples to apples" disclaimer was relegated to the 8th paragraph of the article.  In a follow-up story on the  purported gains at the Renewal schools on July 31, the Times reported similarly big jumps in proficiency in these struggling schools, and reserved the "apples to apples" comment to the 8th paragraph of the story.

Chalkbeat NY ran a story on last Friday, purporting to show that testing experts disagreed with the NYSAPE analysis that the drop in the raw scores may have also contributed to the increase in proficiency rates, along with other changes.  After "eyeballing" the changes in the 3rd grade ELA raw scores, Jennifer Jennings said "that just looks like year-to-year variation."  I agree that a larger contributor to the large leaps in ELA, particularly in 3rd and 4th grades,  may have been the fact that the exams were shorter and untimed, especially since many teachers said kids didn't finish them in 2014 and 2015.  Which is not to say that the raw score adjustments might not be a factor as well, particularly in math and other grades.

The article also quoted Aaron Pallas, who said he couldn't tell if the changes in the raw scores mattered until the technical report is released, a year or so from now. This may be true, but there is no reason that reporters and researchers should not urge the state to release the P-values more quickly, to allay our concerns.

The Chalkbeat article concluded with Daniel Koretz, who said simply that last year's results can't be compared to this year's.  Exactly right. Then why did Chalkbeat itself run several stories recounting the surge in proficiency, and even speculating on the possible causes, including the potential effects of the city's education "reforms", the expansion of charters,  and/or the teaching of the Common Core?


4.  Then there's the lack of any historical context. The paper of record has a lamentable record of failing to report on the well-documented evidence of inflated test score gains that occurred from 2003-2009, until the state itself admitted what had happened and re-calibrated the cut scores in 2010.  Their unshakeable credulity led to a front page story on August 3, 2009  -- a little more than seven years ago to the day, recounting the big jump in student achievement and giving credit to the Bloomberg reforms.  Like now, they refused to explain the multiple sources of evidence to the contrary, including the fact that the NAEPs showed only modest gains over the same time period.  My argument with the Times editors even made the Village Voice .   As Wayne Barrett wrote,

The Times front page piece last week -- headlined "Gains on Tests in New York Schools Don't Silence Critics" -- failed to quote any real critics, but gave Klein six self-promoting paragraphs. It did bury a single questioning quote from two academics not known as critics of the test scores in the thirty-fourth paragraph, but the top of the story trumpeted success scores that would have silenced any critic. If, that is, they were true.

Two days after the article ran, the NY Senate voted to renew Mayoral control.  A few months later, Bloomberg was re-elected to a third term.   Sure enough, when the test bubble burst in 2010  all the gains were shown to be illusory. Even after that, though, in 2011 a writer for the NY Times Magazine reported that "since 2006, the city's elementary and middle schools have seen a 22-point increase in the percentage of students at or above grade level in math (to 54 percent) and a 6-pont increase in English (to 42 percent)."

These statistics were completely fabricated of course, provided by DOE to the reporter, and somehow neither the reporter nor any editors had bothered to  check them.  It turned out the DOE had made up the data by re-adjusting the cut scores to where the state had previously put them, essentially rewriting history as though test score inflation and deflation had never occurred.

All this is to say: If reporters at the NY Times and other media outlets are prepared to point out the unjustified claims promoted by public officials and some advocacy groups last week, that is good; but they might try also provide some context to explain the larger reasons for skepticism.

1 comment:

Kemala said...

Thanks for this, Leonie. It really blows the mind that these reporters conveniently skim over how comparisons (and therefore "growth") CANNOT be calculated when testing conditions and the tests themselves vary from year to year. Lazy (and irresponsible) journalism.

The one thing I would note is that the tests were not meaningfully "shorter" for the kids who had to take them, differing by only a few questions.