Statistical Principles of a PVT/Quick Count
The quick count methodology applies statistical principles to a very practical problem—verifying an electoral outcome.1 This chapter outlines those statistical principles and describes how they work together. The briefest way to present this information is to use the language of mathematics, and to a certain extent that language is unavoidable. The goal of this chapter, however, is to present the basic concepts in a non-technical way so that the logic behind quick count methodology is accessible to a general audience.
The first part of this chapter presents the foundations of quick count methodology. It begins by considering the robustness of quick count data and such core concepts as sample and population. The chapter then turns to an explanation of statistical principles, such as the law of large numbers and the central limit theorem. The second, more technical half of the chapter presents the process for constructing a sample. It outlines measures of central tendency and dispersion, and then discusses standard strategies for calculating and drawing samples. It also takes up practical questions, such as correction factors, that are designed to manage the unique problems that arise in the application of statistical principles to quick count situations.
BASIC STATISTICAL PRINCIPLES
Statistical principles drive the methodology for collecting and analyzing quick count data. This methodology is grounded in broadly accepted scientific principles. Like the law of gravity, these statistical principles are not just a matter of opinion or open to partisan interpretation; they are demonstrable and universally accepted. It is precisely because these principles are scientifically based that quick count organizers can make authoritative claims about election outcomes. It is one thing to claim that an election has been fair or unfair. Quick count methodology allows a group to demonstrate why election-day processes can be considered fair, or the extent to which they have been unfair.
Reliability and Validity
Statements made about election-day processes are only as strong as the data upon which they are based. Consequently, it is important to take quite deliberate steps to ensure that the data collected meet certain standards. One is that the quick count data themselves have to be “robust.” That is, the data have to be both reliable and valid.
Data are considered reliable when independent observers watching the same event (the vote count) and using the same measuring instrument (the observer form) evaluate that event in exactly the same way. A simple example illustrates the point:
- Three different people (A, B and C) repeatedly measure the height of a fourth person (Z) on the same day. The measure of that person’s height would be considered reliable if all three observers (A, B and C) using the same measuring instrument (a standard tape measure) produced exactly the same results in their measure of Z’s height.
The very same principle applies to quick count data collection; it is essential that both indicators and measurements are reliable. The information produced by observers should not change because of poor indicators, inadequate measurement instruments (an elastic measuring tape) or poor procedures—nor should the results vary depending upon who is doing the measuring. Reliable results will vary only when there are genuine changes in the phenomenon that is being measured. Reliable data, then, are data that can be independently verified.
Quick count data should also be valid. Validity concerns how well any indicator used actually fits the intended concept that is being measured. A measure is considered valid if the indicator used for measurement corresponds exactly, and entirely, to the scope and content of the object that is being measured. The previous example can be extended to illustrate the point:
- Three additional observers (D, E and F) are asked to report the size of the same person, Z. D and E might report that Z, who is six feet tall, is big, whereas F might say that Z is medium. The problem is that the concept of size is ambiguous and open to different interpretations; for some people it might mean more than just height; therefore, size lacks validity. D might consider Z big because Z is much taller than D. E might think of Z as big because Z is heavier than E. F might report that Z is medium because Z is about the same size and height as F, and F thinks of herself as medium. In fact, the ambiguity of the notion of size is a problem; it is a threat to reliability and validity.
It is for these reasons that exit polls and opinion polls should be interpreted with extreme caution. Exit and opinion polls often produce unreliable estimates of actual vote results on election day. This is because exit polls measure recollections, and opinion polls measure intentions concerning citizens’ votes. For quite understandable reasons, people are tempted to misreport either how they voted or how they intend to vote. Quick counts, by comparison, are reliable and valid because observers collect official vote count results from individual polling stations. Quick counts measure behavior, not recollections or stated intentions. They measure how people actually voted, not how they might have reported their vote to a complete stranger.2
The robustness of quick count data also depends on how the sample is constructed; the sample determines which votes are used as the basis for estimating election outcomes. The basic idea of a sample crops up in many different ways in everyday life. For example, chemists routinely take a “sample” of a compound and analyze that sample to make accurate statements about the chemical properties of the entire compound. Physicians take blood samples from patients to determine whether the composition of a their blood is causing illness. Fortunately, physicians do not need to drain all of the blood from patients’ bodies to know exactly what it contains. Such an approach is impractical and unnecessary, since a single blood sample reveals all that a physician needs to know about the contents of a patient’s entire blood supply.
Quick count samples rely on exactly the same principles. An observer group might consider asking volunteers to observe every single polling station in the country and report every single result. That strategy would require a huge amount of resources, and it is unnecessary. Like the chemist and the physician, observer groups can learn everything they need to know about the entire voting population by using a carefully designed sample. The method is faster, cheaper and more practical.
Quick count samples provide a reliable foundation for making accurate estimates of the total population because a sample is a particular subset of the total population, a subset that reveals population characteristics. Even so, designing samples means making choices, and those choices have a profound effect on both the accuracy of the data and the kinds of data analysis possible.
Technically, a population refers to all the relevant individual cases that exist within a certain boundary. Often statisticians are not concerned with counting individuals. Quick counts are not interested in every individual living within the boundary of a particular country. Quick counts are concerned only with the relevant population—every individual who is eligible to vote. The quick count’s relevant population excludes all people who, for whatever legal reason, are not eligible to vote. The electoral laws of most countries have clear rules concerning voting age, for example. Very young people are not usually eligible to vote, although the precise age limit varies from one country to the next. Similarly, most countries have citizenship requirements that allow only citizens to vote in national elections.3
Getting from a Sample to a Population
Quick counts begin with the assumption that the vote count data themselves are reliable and valid. In other words, quick counts assume that the official vote counts produced at polling stations—the data collected by observers from each and every sample point—are robust information. In fact, observer groups are able to verify that assumption by undertaking a systematic qualitative observation of the voting and counting processes at the polling stations.4
If a systematic qualitative observation of election-day procedures establishes that the vote count data are reliable and valid, and if basic statistical principles are followed, then accurate estimates of the distribution of the vote for the entire country can be made on the basis of a properly drawn sample. It is possible to make very accurate estimates about the behavior of a population (how the population voted) on the basis of a sample (of the results at selected polling stations) because of the theory of probability.
Probability: The Law of Large Numbers and the Central Limit Theorem
Probability concerns the chance that an event, or an outcome, will occur. It is possible to estimate the probability of unknown future events – that Brazil will win the World Cup, or that it will rain today. No one knows ahead of time what will happen, but it is possible to make an educated guess based on the team’s performance in other events, or the meteorological conditions outside. It is also possible to make predictions about probability based on the known likelihood that something will happen. Consider the classic statistical example of tossing a fair coin, one that is unbiased:
- A coin is tossed in the air 100 times. With a fair coin, the chances are that the outcome will be heads 50 times and tails 50 times, or something very close to that. Suppose now that the same rule was tested using only a few tosses of the same coin. Tossing that same coin 12 times in the air might produce outcomes that are not exactly even. The outcome could be 9 heads and 3 tails. Indeed, in exceptional circumstances, it is possible that, with twelve throws, the coin could land heads up every time. In fact, the probability that such an unusual outcome will occur can be calculated quite precisely. The probability of twelve heads in a row turns out to be one in two to the twelfth power (1/2)12, or one in 4,096 or 0.024 percent. That is, the chance of getting twelve heads (or tails) in a row is one in four thousand and ninety six. Probability theory indicates that the distribution between heads and tails showing will even out in the long run.
One aspect of probability theory at work in the above coin toss example is the law of large numbers. This statistical principle holds that, the more times that a fair coin is tossed in the air, the more likely (probable) it is that the overall distribution of total outcomes (observations) will conform to an entirely predictable and known pattern. The practical implication is clear: the more data we have, the more certain we can be about predicting outcomes accurately.
This statistical law of large numbers is firmly grounded in mathematics, but the non-technical lesson is that there is safety in numbers. A second example illustrates a related point important to understanding the basis of the quick count methodology.
- Consider a class of 500 students taking the same university course. Most students will earn Bs and Cs, although a few students will earn As, and a few will earn Ds, or even F’s. That same distribution of grades would almost certainly not be replicated precisely if the same course had a class of 10 or fewer students. More importantly, the grades of exceptionally good, or exceptionally poor, students will have quite a different impact on the average grade for the entire class. In a small class, those “outlier” grades will have a big impact on the overall distribution and on the class average; they will skew the results of the grade curve. But in a larger class, the impact of any individual exceptional grade will have a far smaller impact on the average mark for the whole class.
The practical implication of the grade distribution example is simple: as the amount of data (number of observation points) increases, the impact of any one individual data point on the total result decreases.
A second statistical principle that is vital to quick count methodology is known as the central limit theorem. This axiom holds that, the greater the number of observations (sample points), the more likely it is that the distribution of the data points will tend to conform to a known pattern. A class of 500 physics students in Brazil will produce the same grade distribution as a class of 300 literature students in France, even though the marks themselves may be different. In both cases, most of the data points will cluster around the average grade.
These two statistical axioms – the law of large numbers and the central limit theorem – work in conjunction with each other. Together they indicate that:
1. the larger the number of observations (sample points), the less likely it is that any exceptional individual result will affect the average (law of large numbers); and
2. the greater the number of observations, the more likely it is that the dataset as a whole will produce a distribution of cases that corresponds to a normal curve (central limit theorem).
A general principle follows from these statistical rules, one that has powerful implications for quick counts: the greater the number of observations we have, the more likely it is that we can make reliable statistical predictions about the characteristics of the population. However, it is absolutely crucial to understand that, for these two statistical principles to hold, the selection of the cases in the sample must be chosen randomly.
A sample can be thought of not just as a subset of a population, but as a miniature replica of the population from which it is drawn. The population of every country can be considered as unique in certain respects. No two countries are the same when it comes to how such characteristics as language, religion, gender, age, occupation and education are distributed in the population. Whether an individual possesses a car, or lives in a city rather than a town, or has a job, or owns a pet dog contributes to the uniqueness of personal experience. It is impossible to produce a definitive and exhaustive list of every single feature that distinguishes us as individuals, let alone for entire populations; there are just too many possible combinations of factors to document. Fortunately, quick count methodology does not require this. Quick counts are not concerned with all of the things that make people different. Quick counts are only concerned with factors that have a demonstrable impact on the distribution of votes within the voting population.
Sample points from the relevant population must be selected at random, and only at random, for the resulting sample to be representative of the total population. In practice, randomness means that the probability of any single sample point being selected from the population is exactly the same as the probability that any other sample point will be selected. And for reasons that have already been outlined, the law of large numbers and central limit theorem indicate that the larger the sample drawn, the more accurately that sample will represent the characteristics of the population.
Homogeneity and Heterogeneity
Reliable samples do not require huge amounts of detailed information about the social characteristics of the total population. However, it is essential to know whether the population of interest is relatively diverse (heterogeneous) or not (homogenous). Assessments of heterogeneity and homogeneity have a significant impact on how populations can be reliably sampled.
There are several ways to examine the level of heterogeneity, or diversity, of any population. Ethnic composition, religion and languages can impact heterogeneity. The primary concern for quick counts, however, is not just with the level of ethnic or religious heterogeneity in a population. The vital question for quick counts is the question of whether that heterogeneity has a significant impact on voting behavior. If one candidate is preferred by 80 percent of the population, then that population is considered relatively homogeneous, regardless of the religious, linguistic or ethnic diversity of the population. Similarly, if the electoral race is close, with the votes evenly divided between two or more candidates, a population is considered relatively heterogeneous.
A common misperception is that socially diverse populations will always be heterogeneous voting populations. However, just because populations are socially heterogeneous, it does not follow that they will be heterogeneous when it comes to voting. For example, India has a multiplicity of languages and religions but is relatively homogenous when it comes to constructing a sample of the voting population.
The greater the heterogeneity of the voting population, the larger the sample has to be in order to produce an accurate estimate of voting behavior. A comparison of required sample sizes for three countries with very different population sizes – Canada, the United States and Switzerland – illustrates this point.
As Figure 5-1 demonstrates, heterogeneity is not determined by the ethnic characteristics of these populations. Heterogeneity is determined by the likelihood that one candidate will win a majority of the electoral support. In a two party system, as in the United States, the electoral race is often easier to follow and much easier to predict – voters usually have only two choices. But in Switzerland, the larger number of parties makes electoral competition more complicated. Swiss political parties are clearly supported by different language and religious groups. Even a country such as Canada, with five official parties, is less heterogeneous than Switzerland.
A related principle is also illustrated in Figure 5-1. The required sample size is determined by the expected level of homogeneity in voting results, not by the total population size of a country. These three countries with very different total populations require different sample sizes to maintain a margin of error of plus or minus two percent (+/-2%). Indeed, it turns out that the country with the larger population requires the smallest sample. In fact, the variations in the required sample size are attributable to variations in the homogeneity of the three different populations.
In practice, reliable information about the heterogeneity, or homogeneity, of voting populations in many countries is hard to find. The safest strategy under these circumstances, one that requires no guess-work, is to make the conservative assumption that the voting population is heterogeneous. As will become clear, that assumption has a profound impact on how a quick count’s sample size is calculated.’
Confidence Levels: Specifying the Relationship between Sample and Population
One additional piece of information has an important impact on how statisticians estimate population on the basis of a sample—the confidence level. Confidence levels concern how the sample data can be compared to the population. The more confidence required that the sample distribution will reflect the population distribution, the larger the sample has to be. This is because, in larger samples, exceptional individual results will have less effect on the distribution.
The conventional practice for statisticians is to rely on a confidence level of 95 percent. Technically, the confidence level expresses, as a percentage, the probability with which one is certain that a sample mean will provide an accurate estimate of the population mean. Thus, a 95 percent confidence level indicates that 95 percent of all sample means will, indeed, correspond to the mean for the population. Because the consequences of inaccurate quick count results can be so serious, the standard practice in election observations is to design the sample with more conservative parameters, a 99 percent confidence level.
CONSTRUCTING THE SAMPLE
The practical business of constructing a quick count sample involves making a combination of judgements. These include:
- identifying the unit of analysis;
- determining the margin of error and confidence levels;
- determining the most appropriate type of random sample; and
- estimating correction factors for sample retrieval rates and non-voting.
The Unit of Analysis
The unit of analysis refers to the precise object that is being examined. If the goal is to generalize about an entire population, then the unit of analysis is often the individual. However, it is possible in some cases to generalize from a sample to a population by adopting a larger aggregate as the unit of analysis, such as a household or city block.
With quick counts, the objective is to estimate the distribution of citizens’ votes between political parties. In a democratic election, the individual vote is secret and so the individual vote cannot be the unit of analysis. Instead, quick counts typically use the official result at an individual polling station as the unit of analysis. This is because the polling station is the smallest unit of analysis at which individual votes are aggregated and because election rules usually require that an official count take place at the polling station.
The Margin of Error: How Accurate Do We Need to Be?
The margin of error is one of the most important pieces of information considered when constructing a sample. Expressed as a percentage, the margin of error refers to the likely range of values for any observation. The following example illustrates the concept:
- Results from one polling station indicate that 48 percent of votes support Candidate A. If the designed margin of error is five percent, there is good reason to be confident that the actual results for Candidate A will fall somewhere between 43 and 53 percent when all voters within the population are considered.
Civic organizations conducting quick counts typically design the quick count samples to have a margin of error of plus or minus 0.5 percent (+/-0.5%). There is occasionally a reason (e.g., the expectation that a vote will be very close) to select an even more stringent margin of error. The desired margin of error depends on what degree of accuracy is required from the estimates.
The margin of error is calculated using the following formula:
ME = (s / √n)* z
ME = margin of error
s = standard deviation (assume 0.5)
n = sample size
z = z value for the selected confidence level (for 95% is 1.96, for 99 is 2.58)
Any dataset, a set of sample point observations, has at least two properties. The data will have a central tendency, around which most of the results cluster. They will also have a variance or spread. Variance refers to how widely, or narrowly, observations are dispersed. There are different ways of measuring central tendency and dispersion, and these are relevant in calculations of the margin of error.
Measures of Central Tendency
The most widely known measure of central tendency is the mean. The arithmetic mean is simply the average value of all recorded observations. The arithmetic mean is derived by adding the values for each observation in a data set and then dividing by the number of observations. The following example illustrates this process:
The following set of numbers: 1, 3, 4, 6, 7 and 9 has a mean of 5. This is because 1+3+4+6+7+9=30, the number of observations is 6, and so 30÷6=5.
There are other ways to measure the central tendency of any data. The mode, for example, refers to the number that occurs most frequently in any set of data. In the following set of numbers: 1, 3, 3, 3, 5, 6 and 7, the observation occurring most frequently is 3. Notice, however, that the arithmetic mean of this same set of numbers is 4 [(1 +3+3+3+5+6+7)÷7=4].
A third measure of central tendency is the median. This number occurs in the middle of a given set of observations. For the following data set: 1, 3, 6, 7, 8, 8 and 10, the number in the middle of the observations is 7; there are three observations smaller than 7 and three observations with values that are greater than 7. The mode for this dataset, however, is 8 because 8 occurs most frequently. The arithmetic mean for this data set is 6.14. Statisticians usually report the mean, rather than the median or the mode, as the most useful measure of central tendency.
Measures of Dispersion
A second feature of data concerns measures of dispersion, which indicate how widely, or how narrowly, observed values are distributed. From the example above, it is clear that any given data set will have an arithmetic mean. However, that mean provides no information about how widely, or narrowly, the observed values are dispersed. The following data sets have the same arithmetic mean of 3:
2, 2, 3, 4, 4
-99, -99, 3, 99, 99,
These two datasets have quite different distributions. One way to express the difference in the two datasets is to consider the range of numbers. In the first set, the smallest number is 2 and the largest number is 4. The resulting range, then, is 4 minus 2, or 2. In the second set, the smallest number is negative 99 and the largest number is positive 99. The resulting range is positive 99 minus negative 99, or 198.
Obviously, the different ranges of the two datasets capture one aspect of the fundamental differences between these two sets of numbers. Even so, the range is only interested in two numbers -- the largest and the smallest; it ignores all other data points. Much more information about the spread of the observations within the dataset can be expressed with a different measure, the variance.
In non-technical terms, the variance expresses the average of all the distances between each observation value and the mean of all observation values. The variance takes into account the arithmetic mean of a dataset and the number of observations, in addition to each of the datapoints themselves. As a result, it includes all the information needed to explain the spread of a dataset. The variance for any set of observations can be determined in four steps:
1. Calculate the arithmetic mean of the dataset.
2. Calculate the distance between every data point and the mean, and square the distance.
3. Add all the squared distances together.
4. Divide this by the number of observations.
The formula, then, is as follows:
For a dataset containing observations x1, x2, x3 … x
s2 = [(x1-x)2 + (x2-x)2 + (x3-x)2 ... (xn-x)2]/ [n-1]
Where s2 = variance
x1, x2, x3 … xn are the observations
x is the mean
n is the number of observations
In short form, it appears as: s2 = ]∑ (x-x)2] / [n-1]
The standard deviation is the square root of the variance. Statisticians usually rely on the standard deviation because it expresses the variance in standardized units that can be meaningfully compared. The larger the standard deviation for any dataset, the more the data are spread out from the mean. The smaller the standard deviation, the more tightly are the individual data points clustered around the mean.
There is one additional measurement concept that needs to be considered: the normal distribution. The preceding discussion shows that, in every data set, individual data points will cluster around an average, or mean, point. Another way to express the same idea is to consider what proportion of all of the observations fall within one standard deviation of the mean. If datasets are large enough, and if they conform to the principles of randomness, the dispersion of the data values will conform to what is called a normal distribution. The normal distribution has well-known properties: the normal curve, as seen in Figure 5-2, is bell-shaped and symmetrical, and the mean, mode and median coincide.
The size of the variance determines the precise shape of the actual distribution. The key point for quick count purposes is that any dataset that conforms to the normal distribution curve has exactly the same standard properties. These are: 68.3 percent of all observed values will fall within one standard deviation of the mean, 95.4 percent of all results will fall within two standard deviations of the mean and 99.7 percent of all results will fall within three standard deviations of the mean. Not all datasets will conform to this exact pattern. If there is a lot of variance within the data, the curve will be relatively flat. If there is little variation, the curve will appear more peaked.
The distance from the mean, expressed as standard deviations, can also be referred to as Z scores or critical values. Most standard statistics textbooks contain a table of Z values for the normal distribution and analysts do not have to calculate Z values each time they confront a data set. Significantly, if data have a 95 percent confidence interval (95 percent of all sample means will include the population mean), then it is clear that the results will fall within 1.96 standard deviations of the mean. Similarly, a 99 percent confidence level indicates that 99 percent of all results (for which the sample mean will include the population mean) fall within 2.58 standard deviations from the mean. In these cases, the values 1.96 and 2.58 represent the critical values, or Z values, for the confidence levels 95 percent and 99 percent, respectively.
Calculating the margin of error requires relying on the standard deviation and Z values. The standard deviation and Z values, in turn, involve measures of central tendency, measures of dispersion and confidence levels. As Figure 5-3 shows, margins of error vary with confidence levels and with sample sizes. In general, the higher the confidence level, the higher the margin of error. The larger the sample size, the lower the margin of error. Decisions about what margin of error can be tolerated with a quick count will directly impact calculations to determine the required minimum sample size.
Types of Samples
There are two basic types of samples: probability samples and non-probability samples. Probability samples comply with the principles of randomness and are, therefore, representative of total populations. Quick counts always use probability samples.
Non-probability samples do not select sample points randomly, and the extent to which they are representative of the wider population is not known. Nonprobability samples are useful under some circumstances. They are inexpensive and easier to construct and conduct than probability samples. The cases in the sample are chosen on the basis of how easy or comfortable they are to study. For example, a television reporter stands outside a ballpark and asks fans whether they enjoyed a baseball game. The strategy provides quick and interesting footage for broadcast, but it does not provide reliable information about the total population inside the ballpark.
For quick count purposes, the fatal limitation of non-probability samples is that they are not reliable for generalizing to the population. The data they produce, therefore, are not reliable estimates of population characteristics. If, for example, a quick count sample were constructed entirely from polling stations in the capital city, the results would almost certainly be different from those coming from a sample of polling stations in rural areas. People drawing on raw data at convenient, easily accessible, locations are not using data that are representative of the population as a whole.
Quick counts must always use probability samples to produce results that are representative of a defined population. There are several types of probability samples, and each can provide accurate representations of the population by relying on different methods. The two most common types of probability samples are the general random sample and the stratified random sample.
General Random Samples
In the general random sample, units of analysis are randomly selected one at a time from the entire population. This gives each unit in a population an equal chance of being included in the sample. However, for every unit of analysis to have an equal chance of being included in the sample, there must be an accurate list of all possible units of analysis.
Statisticians refer to the list of all members of a population as a sampling frame. In the case of a quick count, the unit of analysis is the polling station; therefore, the sampling for a quick count can only begin when an accurate and comprehensive list of all the polling stations is available.
Stratified Random Samples
The stratified random sample applies the same principles of randomness as the general random sample. However, the sample frames from which the sample points are selected consist of predetermined, and mutually exclusive, strata of the total population. For example:
The goal of a project is to use a sample of 1000 students to generalize about a university population of 20,000 students, half of whom are undergraduate students and half of whom are graduate students. While the general random sample approach simply randomly selects 1000 sample points out of the total list of 20,000 students, the stratified sample approach follows two steps. First, it divides the list of all students into two groups (strata), one including all undergraduate students and the other including all graduate students. Next, it selects 500 cases from strata 1 (undergraduates) and another 500 cases from strata 2 (graduates).
In the stratified approach, the selection of each case still satisfies the criteria of randomness: the probability of the selection of each case within each strata is exactly the same (in the above example, 1 in 20). However, the practice of stratifying means that the end result will produce a total sample that exactly reflects the distribution of cases in the population as a whole. In effect, the stratification procedure predetermines the distribution of cases across the strata.
Stratification may be useful in another way. Some observer groups do not have the resources to conduct a nation-wide observation. In that case, the observer group might want to limit its observation to a particular strata of the country, perhaps the capital city, or a coastal region. In these instances, with a randomly selected set of sample points within a strata, the observer group can generalize the results of the observation to the entire strata that the observer group covers.
Determining Sample Size
To determine the sample size for a quick count (i.e., how many polling stations should be included in the sample), analysts proceed through several steps. They identify the size of the relevant population (number of eligible voters); determine the level of homogeneity within that population, and select the desired level of confidence and the margin of error. Next, analysts calculate the sample size as follows:
- n = [P (1-P)] / [(∑2 / z299%)] + [(P (1-P)/N)]
Where n = size of the sample (number of eligible voters)
P = suspected level of homogeneity of the population (between 0 and 1, so 50% = 0.5)
∑ = margin of error (between 0 and 1, so 0.32% = 0.0032)
z99% = level of confidence in the case of normal distribution (99% in this case)
N = size of the total population
The case of a quick count conducted during the 2001 Peruvian presidential elections can illustrate the above steps:
- The size of the total relevant population (number of eligible voters) in Peru was 14,570,774. The population was assumed to be heterogeneous—the race between two candidates was expected to be close, so the level of homogeneity of the population was set at 50 percent (0.5). A margin of error of 0.32 percent and a confidence level of 99 percent were selected. For the purposed of making a calculation, the proportion of homogeneity was expressed as a value with a range between 0 and 1, as was the margin of error. The expected level of homogeneity was set as 50 percent, the most conservative assumption; it is expressed as 0.5 in the formula, and the margin of error of 0.32 percent (out of a possible 100 percent) is expressed as .0032. These values were plugged into the formula.
At this point, analysts know how many voters have to be consulted. However, the units of analysis are not individual voters; they are polling stations. Therefore, the next step is to determine how many polling stations must be selected to represent the required number of voters. The Peruvian calculation can be continued to illustrate the point:
- On average, there were approximately 160 voters per polling station in Peru. Therefore, the sample size of 163,185 (eligible voters) was divided by the number of electors per station (160) to determine the number of polling stations in our sample (1,020). Consequently, the sample size for the 2001 Peru quick count was 1,020 polling stations.
Selecting the Sample Points
Once the required size of the random sample is known, the sample can be selected from the sample frame. For quick counts, polling stations (the sample points) are selected from the complete list of polling stations (sample frame). The simplest way to do this is to use a random computer program. However, this task can also be accomplished without a computer. The first step involves dividing the total number of polling stations by the desired number of polling stations, and the second step requires determining a random starting point. Again, the numbers from the 2001 quick count in Peru can be used to illustrate how this is done:
On election day, the Peruvian universe consisted of 90,780 polling stations. First, the total number of polling stations is divided by the desired number of stations in the sample (90,780÷1,020 = 89). This indicates that one in every 89 polling stations needs to be selected. Second, a random starting point is selected by placing 89 slips of paper, numbered 1 to 89, in a hat, and randomly selecting a piece of paper. The piece of paper selected contains the number 54. The 54th polling station on the randomly ordered list is the first sample point, then every 89th polling station after that first sample point is selected. Thus the second polling station in the sample is the 143rd polling station on the list (54 plus 89). The procedure is repeated until the total sample size of 1,020 is reached.
Why does the list of polling stations have to be ordered randomly? This strategy further protects the validity and reliability of the quick count. If the original list is organized by size, region, or other criteria, the results of a simple draw could be biased. Usually this is not a serious concern, but random ordering is a technique that provides additional assurance that the probability of the selection of each point in the sample is equal to the chances of any other point being selected.
It is sometimes necessary to make adjustments to various elements of the quick count methodology. These adjustments apply to volunteer recruiting and training and to more technical elements of the quick count, including sampling. The sample calculations outlined above usually require some additional adjustment. This is because it is assumed initially that all sample points will be identified and that data will be delivered from each and every point. In practice, however, no large-scale quick count undertaken by any observer group has ever been able to deliver data from every single data point in the original sample.
In quick count situations, it is important to draw a distinction between a theoretical sample and a practical sample. Most theoretical discussions of sampling assume that, once a sample point is selected, data from that sample point will be generated with 100 percent efficiency. This assumption has never been satisfied in any large-scale national quick count. This is due to any combination of factors including mistakes made by inadequately trained observers, breakdowns in communication systems or unforeseen election-day developments. (For instance, observers are sometimes prohibited from entering polling stations; inclement weather might prevent observers from reaching a telephone or prevent data from being reported.)
Civic organizations undertaking a quick count for the first time, on average, are able to deliver about 75 percent of the data from sample points within a reasonable time frame, about 3 hours. The 25 percent of the sample that is not reported (these are missing data) can lead to problems with the interpretation of the other data. The practical usable sample, therefore, is always smaller than the theoretically designed sample. The margins of error that apply to the practical sample are also necessarily larger than planned.
In a closely contested election, missing data can be a very serious matter. Moreover, these missing data are hardly ever just a random cross-section of the total sample. In practice, the proportions of missing data are nearly always greater from remote areas where data are most difficult to recover. If the missing data are not random or representative, they are biased. And if the missing data are biased, so is the remaining sample.
What is the best way to prepare for the fact that not all of the sample will be recovered on election day? The solution must be built into the original sample design; it is to oversample by the margins of the expected recovery rate.
An experienced observer group might have an estimated data recovery rate of 80 percent of the sample points from the theoretical sample. In this case, the practical sample would be 20 percent smaller than the theoretical sample. The most direct way to address this potential problem is to simply increase the sample size by 20 percent by randomly adding 20 percent more sample points to the sample that is first calculated. Such a straightforward strategy would work if the deficit in sample recovery were distributed randomly throughout the population. However, experience indicates that the deficit is usually unevenly distributed between the capital city, other urban areas and rural areas. The most difficulty is in remote areas, and the design of a corrected oversample component must take this into account. Figure 5-4 shows the distribution of a typical sample recovery pattern and the corrected oversample component. As Figure 5-4 indicates, the additional correction for uneven sample recovery would place at least half of the oversample in the rural areas.
Correcting for Polling Station Size
Sometimes it is necessary to adjust the margin of error for quick count results due to practical considerations. For example, the size of the polling station— the total number of voters expected at the polling station—will affect the margin of error. This stems from the difference between the defined population and the unit of analysis. Recall that the original calculation of the margin of error relied on the total number of eligible voters. This was done to ensure that the sample design satisfied certain statistical principles. However, since polling stations are the unit of analysis, it is useful to revise the margin of error based on the number of voters in polling stations. In the previous example, an average of 160 voters were assigned to each polling station. It would have been important to consider the fact that polling stations can come in different sizes. If polling stations included 200 voters, this would have had an effect of reducing the number of stations needed for sample. If the polling stations were even larger, with 500 voters, then even fewer would have been needed to form the sample.
As Figure 5-5 illustrates, the number of polling stations and the number of voters in a polling station will have an effect on the margin of error. This is attributable to the role of the sample size in constructing the margin of error. Recall that the formula for margin of error is:
- [(Assumed heterogeneity) * (z value at chosen confidence level)] / [√n]
- The fact is that variations in the size of polling station will also affect the ‘n.’
Notice that the margin of error depends on the number of polling stations in the sample. If the polling stations are large, fewer of them are needed to generate the desired sample of 163,185 voters. The margin of error calculated for the polling stations is larger than the margin of error calculated for the sample of voters. The resulting margin of error for quick counts falls somewhere in between the lower and higher margin of error.
Tracking the changes to the margin of error for a range of polling station sizes shows that, as the number of stations needed to form a sample of voters decreases, the margin of error increases. Figure 5-6 illustrates the relationship between the size of the polling station and the margin of error.
The margin of error increases as polling station size increases. The overall effect of polling station size on margin of error, however, decreases as both rise. Figure 5-7 illustrates this point.
Correcting for Turnout
When elections are very close, quick count analysts must also be concerned with the level of voter turnout. Even if observers have been successful at retrieving data from each of the 1,020 polling stations in the theoretical sample, low voter turnout will mean that there will be fewer votes included in the sample than if turnout had been high. The original calculation was based on the expectation of some 160 votes per polling station. If turnout has been at 70 percent, however, there would only be 112 votes at each polling station. If that pattern is repeated across the 1,020 polling stations, then the count would include only 114,240 votes, some 50,000 shy of the desired 163,185 needed to achieve a margin of error of 0.3 percent and a confidence level of 99 percent.
Consequently, a cautious data interpretation strategy calls for re-calculating the margin of error based on the actual number of votes counted. Figure 5-8 illustrates this point.
As the table shows, as turnout decreases, the margin of error increases. If turnout is above 60 percent, margin of error will increase by approximately 0.02 percent for every 10 percent drop in turnout. As turnout approaches 50 percent, the increase in margin of error is much greater. A graph of the increase in margin of error corresponding to decrease in turnout is presented in Figure 5-9.
This chapter has laid out the broad statistical principles underlying quick counts for a general audience, and it has outlined the statistical foundations of the quick count methodology. Organizers should understand this methodology, particularly the concepts of reliability and validity, as well as why a sample must meet the criteria for randomness. This knowledge is vital to the design of effective and reliable observer forms and training programs. It also underscores the importance of preparing to retrieve data from every part of the country—even the most remote areas.
Finally, this chapter also has considered the more technical matters of how sample sizes can be calculated, and how such issues as levels of confidence, margins of error and heterogeneity or homogeneity of the population shape the sample. Most observer groups seek the services of a trained statistician to construct and draw a sample and to analyze the data on election day. Civic groups must realize that the quick count is a matter of applying statistical principles to practical, unique circumstances where standard textbook assumptions may not be satisfied. For that reason, the chapter outlines what are the most common correction factors that should be taken into account when analysts consider the interpretation of the data that are successfully retrieved on election day.
1 This chapter focuses on the statistical principles involved in drawing a random sample of polling stations, from which data is collected and analyzed to project or verify election results. Quick count methodology, however, has evolved, and the same statistical principles now drive the qualitative observation of an election. Chapter Six, The Qualitative Component of the Quick Count, describes how information on the voting and counting processes can be collected from the same observers and the same polling stations used to retrieve data on the vote count. These findings can be reliably generalized to the quality of the voting and counting processes throughout the country.
2 Quick counts also measure qualitative aspects of voting and counting processes, and, as discussed in Chapter 6: The Qualitative Component of a Quick Count, great care is required in designing questions to measure qualitative indicators.
3 It should be noted that the democratic nature of an election can be negated by improper, discriminatory exclusions from voting eligibility and/or by manipulations of official voter lists. Such issues are not addressed by quick counts but should be covered by other election monitoring activities. See, e.g., Building Confidence in the Voter Registration Process, An NDI Guide for Political Parties and Civic Organizations (2001).
4 Chapter Six, The Qualitative Component of the Quick Count, details the procedures followed to systematically evaluate the quality of the voting and counting procedures.