With hundreds of polls online predicting election outcomes, it is important to look into how election polls work to understand how to interpret them for the 2020 presidential election. Research has helped identify where the 2016 election polls went short, including the underweighting of location and education, so pollsters hopefully can deliver more accurate polls this year.
President Donald Trump was announced as the winner in the 2016 presidential election despite many polls indicating that Democratic presidential candidate Hillary Clinton would win, increasing public scrutiny about the accuracy of election polls. Looking into how election polls work can help us understand where the 2016 election polls fell short and what we can expect from the 2020 election polls.
“An Evaluation of 2016 Election Polls in the U.S.” by the American Association for Public Opinion Research reveals that national polls were actually fairly accurate in 2016, only overestimating Clinton’s lead over Trump by one percentage point in the popular vote. The popular vote relates to the number of voters cast nationwide for a candidate, regardless of how their vote relates to the Electoral College — which is composed of 538 electors representing all 50 states and Washington, D.C. and officially elects the president.
However, it was at the state level — especially in Pennsylvania, Michigan and Wisconsin — where polling fell short, according to politics Prof. Paul Freedman.
“The polls miscalculated … in part because they failed to give enough weight statistically to — in particular — white voters from rural areas without college degrees,” Freedman said.
Weighting — a polling technique used to adjust how much weight a polled individual’s vote has to more properly align the sample group’s demographic breakdown with the demographic breakdown of population they're sampling — is an important aspect of election polling because polls collect data from samples of the population, which may not be perfectly representative of the population they’re polling. Statistics Assist. Prof. Gretchen Martinet describes sampling in many polls.
“With election polls … most of the reputable ones are going to take a sample covering someone from all 50 states, and so they would then do something like stratified sampling, but then within the strata, they may do other forms of sampling,” Martinet said. “It can get really complex, which is why weighting is important, because as you go down to the sample level, you have to weight back up every level that you sample.”
There are an abundance of factors that pollsters weight for when calculating polls. According to Politics Assist. Prof. Alex Welch, racial demographics, education and gender are three factors that are particularly important when weighting to ensure that all parts of the population are accurately accounted for.
However, certain factors, including education and location, played a greater role in contributing to election results in the 2016 election than in other years. According to Martinet, pollsters did not account for these factors enough and are trying to give more importance to this year.
In an effort to make a more accurate model, graduate Data Science students Matthew Thomas, Chad Sopata, Ben Rogers and Spencer Marusco have created their own forecasting model for the 2020 election. When asked about the 2016 election, Thomas and Sopata agree with Martinet that education and location weren’t weighted enough. However, Thomas brought to light another problem that contributed to the inaccuracy of the 2016 election polls — in 2016, 20 percent of voters were undecided before Election Day.
“If someone was trying to predict the election, what they would typically do is ignore the undecided voters, or they might assume it splits roughly 50-50,” Thomas said. “But, what ended up happening is that … a huge majority of those undecided voters went to Trump.”
This is another problem to keep in mind when evaluating 2020 election polls as a significantly smaller portion of the electorate is undecided heading into Election Day — only 3 percent were undecided a week from the election as compared to 11 percent at the same time in 2016.
One more thing we can learn from the 2016 election when looking at polls this year is the importance of state election polls. According to Freedman, at the end of the day, it’s the electoral vote — the vote cast by a state’s Electoral College members — that matters rather than the popular vote, which is why state-level polling is so important.
“[As part of the forecasting model], we looked at state data going back to 1990 to get an idea of, fundamentally, where each state was sitting three or four or five months out from the election,” Sopata said.
This is what the data science team believes makes its forecasting model unique from most other models that forecast on the national level.
Although pollsters are generally learning from the 2016 polling mistakes, it can be difficult to predict how accurate the 2020 election polls will turn out to be.
One common polling myth is that they will be completely accurate. According to Welch, statistical uncertainty and margin of error, measurements of how unsure statisticians are about the data, must be taken into account.
“People don’t seem to understand statistical uncertainty and margin of error,” Welch said. “Those kinds of issues that make it so that polls are not going to get it completely 100 percent on the number.”
While 2020 polling has aimed to fix several of its errors from 2016, Martinet points to response bias and leading questions as two sources of error that still might prevail in the election polls this year. Leading questions are when the questions asked in polls are phrased in a way to lead respondents toward one answer. Response bias is when the person does not answer the questions truthfully, according to Martinet.
“So another thing that may have happened in the 2016 election is that voters for a certain candidate weren't truthful with the pollsters,” Martinet said.
With the complex nature of polling, how then do we interpret the 2020 election polls? Freedman explained that it is important to understand that polls will never be perfectly accurate because they are measures of public opinion, which inherently change day-to-day. His advice for readers is to not just look at one poll, but rather an aggregate of polls, which can be seen on FiveThirtyEight or The New York Times.
“One of the reasons that we like aggregation is that some polls are better than others [and] some pollsters are better than others,” Freedman said.
Welch also recommends looking for polls associated with colleges including Quinnipiac University and Franklin and Marshall College.
Although it may be difficult to predict exactly how accurate the polls this year will be, pollsters have been able to identify some problems from the 2016 election that they are taking into account this year, which is a promising sign. Additionally, Thomas described how the polls have been pretty stable this year compared to the previous presidential election, in which the polls were unsteady and susceptible to weekly changes based on the news.
“Some news would come out, [and] Clinton support would go up,” Thomas said. “Something else would come out, [and] her support would go down.”
Sopata uses his own forecast model to support the stability of polls this year.
“Once we started adding the polling data in, even if it was 50 days out, we didn't see much change in our outcomes moving up to this week before the election,” Sopata said. “I think in a typical election cycle, we would have seen quite a bit of up and down with our own predictions.”
A weighted average poll made by FiveThirtyEight.com currently predicts Biden leading with 51.8 percent of votes. However, it is likely that the world will have to wait until after Election Day to truly see how accurate 2020’s polls were.