12
February
2012

Statistically significant

It is difficult to accurately assess the extent of cheating in a recent Honor scandal

By Tim Thornton on March 1, 2010

According to “Honor reviews exam discussion policies,” from the Feb. 22 edition of The Cavalier Daily, Andy Bean — the Commerce School’s representative on the Honor Committee — told the Committee of a professor who recycled 15 questions from a midterm exam on a final exam. The final was given to two sections on consecutive days. Scores on the second day were significantly higher than on the first day, according to the story. “Exactly what happened is hard to define,” Bean was quoted as saying, “but the significant difference in scores suggests there was a problem.”

Bean suggested that students from the first session may have tipped off the second group of test takers through innocent comments about the test’s difficulty. “I don’t think it’s malicious, or people are intending to commit an honor offense,” Bean was quoted in the story. “It’s an awareness issue — students don’t realize they are giving an advantage to another student.”
Committee Chair David Truetzel took a harder line, calling it an inexcusable honor offense even to say an exam was hard if the professor has explicitly forbidden discussion. To me, saying the exam was hard, or even that I wish I had spent more time studying this or that, isn’t “discussing” and calling that an honor violation might be upholding the letter but not the spirit or intention of the law. I also thought there could be all sorts of explanations for a higher average score in the second group, beginning with the fact that those students had another day to study.

Of course, if students in the first section were stupid enough to give the second section help, there’s punishment built into the offense. Giving the second-day exam takers an advantage puts the first-day students at a disadvantage. I still hold to that last point after reading Prof. Blaine Norum’s column, his online comments, and the data he’s made available on the Web. But my understanding of nearly everything else changed. Norum was the professor Bean talked about.

If I understand Norum’s charts correctly, nearly 16 percent of the first-day exam takers got all 15 of the recycled questions right. On the second day, closer to 42 percent did. That’s more than a “significant difference.” That’s dramatic. Astounding, even. “The only viable explanation is that a very large fraction of the students in the second sitting had been forewarned that these questions would appear and acted on that information,” Norum wrote in his column.

I’m not sure it’s the only viable explanation, but it certainly is a viable explanation.

Norum has more than statistical analysis. He received at least two e-mails that spoke of cheating. One said that on the first day of test taking, students had accessed the old test online while taking the new test. Another said that students on the second day knew that questions were being recycled, so they studied the old test. One e-mail said “an exceptionally large number of students cheated,” which implies that cheating isn’t exceptional, but the number of cheaters was.

Norum claims to show statistically that “an absolute minimum of 47 percent of the students at the second sitting had cheated; a slightly more sophisticated analysis put the minimum at 60 percent.” The class was “significantly more than 1 percent of the undergraduate student body,” a larger sampling than pollsters use to predict elections, he wrote. “Thus, unless someone can find a reason why the students in this class were especially dishonest, one must conclude that the figure of at least 60 percent is representative of the student body at large.”

So, he concludes, at least 60 percent of the University’s undergraduates condone cheating. That, he argues, “illustrates with mathematical clarity what many faculty members know from their own experience: the Honor System as currently constituted and administered is dysfunctional.“ Norum presented the data to both sections of the class the following semester. Did he present it to the Honor Committee so the system he calls dysfunctional would have a chance to function?

I don’t know, but I’d like a reporter to find out.

I’m a liberal arts major and I’m at least one graduate degree behind Norum, so it’s foolish of me to question his statistical analysis. But I know that pollsters are not infallible. Neither Dewey nor Gore became president. Even exit polls get it wrong. If not, Doug Wilder would have been elected governor of Virginia by nearly 10 percentage points instead of less than one. And I’m not sure Norum’s sample is random enough to be valid. Still, Norum may be right. Trying to determine if he is could be one of the most important stories The Cavalier Daily covers this year.

What does it mean if cheating really is acceptable to 60 percent of the University’s undergraduates?

Tim Thornton is The Cavalier Daily’s ombudsman. His column appears Mondays.

3 Responses to “Statistically significant”

  1. Blaine Norum says:

    Dear Tim,

    Liked your piece. It represents a first step in a discussion I would very much like to see occurring. Couple of points. First, regarding the wisdom of people from the first sitting helping those in the second sitting of a “curved” test. The explanation that makes sense to me is that many students from the first sitting each told a small number of friends about the questions, hoping thereby to help them without affecting the overall results enough to change the curve and affect their own grade. Second, polling has become much better since Dewey vs. Truman and the results in the case of the Wilder election are widely thought to have been affected by what is now called the “Bradley effect” whereby it is suggested that some people will not necessarily answer truthfully when race is involved. The “sampling” data I presented were based strictly on people’s actions (that is, their test scores) so should be more reliable than in the cases you cite. As to whether there is a basis for considering the sample to be unrepresentative, that is a question I myself alluded to in my original article.

    Your concluding comment & question are certainly provocative. The question is one I have asked myself over the last almost two years. Keep up the good work.

    Report this comment

    Agree/Disagree: Thumb up 0 Thumb down 0

  2. tim thornton says:

    Thanks, Blaine.

    I’m certainly aware of the Bradley effect — though we Virginians prefer to call it the Wilder effect. And it’s my understanding that the Dewey polls weren’t the victim of poor procedure as much as poor judgment. The polls showed Dewey so far ahead that the folks paying for the polling didn’t want to buy another round. They quit polling early. That doesn’t explain the Gore results, but that would lead to a whole other discussion.

    I would argue that your sampling wasn’t of actions, but of results. Those results could be explained by cheating, by loose lips, by the second group being better guessers and multiple choice test takers, by the second group studying the old test without prompting, by former students talking about questions being recycled, by students blowing off early assignments and actually studying for the final, divine intervention, various combinations of these and probably some more things I haven’t thought of. Statistical analysis can tell us what’s more likely, but it can’t tell us anything for certain. Nevertheless, your very reasonable interpretation that cheating is the correct answer provides a way into what seems to me to be a very important discussion.

    I’ve had people tell me that the community of trust is the arbiter of what’s acceptable and what’s not, so it’s no big deal if what you or I consider cheating, the community defines as a friendly heads up to a classmate. Maybe.

    Different eras have different standards. (Shooting at the Rotunda clock would be considered more serious now than in the 19th century, I’m guessing.) But it would be interesting and useful to know more about where the lines are drawn. I hope the discussion continues and broadens, but since you and I are the only online commentors so far, I’m not all that hopeful.

    Tim

    Report this comment

    Agree/Disagree: Thumb up 0 Thumb down 0

  3. Earnan says:

    To dwelve totally off topic: To many who have studied it, the Bradley effect is a myth. “A theory in search of data.”

    Report this comment

    Agree/Disagree: Thumb up 0 Thumb down 0