By Jenny Anstey
With the 2020 US Presidential election looming, and poll calculations make headlines daily, it is hard not to think back to 2016. The electoral college was won by Donald Trump and the Republican party, despite the fact that Hillary Clinton won the popular vote, and the majority of polls beforehand predicted a “historic win” for Clinton.
One poll predicted that Clinton had an 89% chance of winning, so why did the election fall in the favour of the 11%? How can the voting behaviour of a country with such a large and diverse population possibly be predicted?
According to Pew Research, the reason for such inaccurate poll predictions in 2016 lay with the overrepresentation of college-educated people in data collection. People with college educations are both more likely to answer surveys and to favour the Democrats. This consequently led to the reported margin of error being half of the true margin of error.
In a New York Times podcast, Nate Cohen argued that white non-college-educated members of the public are “statistically underrepresented in the polls” and that the Democrat party likely assumed they would not need to win this vote in order to win. He highlighted that any probability of loss is a significant probability, even that 11%.
Different polls are calculated differently by different companies or organisations in the US; CNN and Fox conduct surveys over the telephone using live interviewers, while CBS News and Politico conduct online opt-in surveys. The worry is that, given the right amount of money, anyone can attempt to conduct poll predictions, which may lead to biased and inaccurate data.
In order to represent the entire population as accurately as possibly, weightings are implemented on the data gathered. Raw data would not give an accurate projection of the entire population; it is impossible to ensure all different demographics in society are accounted for within a sample.
The Economist is calculating their polls for the US election by simulating over 20,000 pathways and allowing things such as polling error, changes in turnout and changes political environments to be taken into account.
There is also consideration of how likely the state is to vote in such a way that decides the outcome of the election. This is in addition to comparing demographic and political profiles of a state, including factors such as the share of white voters and how religious the population is.
A data journalist from the Financial Times explained that in the UK, we use a multilevel regression and post-stratification system (MRP) in order to ensure polls are representative. This involves a large poll sample and, importantly, asking people questions about their age, sex, ethnicity and education, for example, rather than who they are voting for.
A regression is then run on the collected data before, in the post-stratification stage, data from sources such as the census are used to obtain the numbers for each characteristic in each constituency. This allows the total number of votes likely to be secured to be calculated.
From this, parties can decide how to use the data to inform their campaign and the public can ensure they are voting in a way that is tactical.
To keep the public as well informed as possible, the New York Times opted for ‘live polling’ for the 2018 midterms and have continued this for the upcoming election. This allows the public to see the change in polls over time and was done in an attempt to “demystify polling”.
Polling data can never be precise due to the human nature of unpredictability; all evidence may point to someone voting for the Republican party, only for them to then decide to vote for the Democrats on election day.
This may lead some to question why we bother conducting polls at all. When done properly, polls exist as an impartial way of informing the general public and giving an overview of where a population stands on particular issues. In this way, polling data can help inform political campaigns. This is why we often see a focus on a particular issue by a candidate; the polls predict that their target demographic sees it as a pressing, or even vote-deciding, matter.
However, greater care must be taken in the calculation and weighting of polls to ensure they are truly representative and proportionally as precise as possible. Polls are useful sources of information in an election, however, they should not be taken to be final or entirely precise. This can lead to complacency and sometimes people deciding not to vote after assuming the vote has already been lost or won.
Illustration: Amber Conway