Collecting and Analyzing Quick Count/PVT Data
The random sample is drawn, observer forms are developed and distributed and observers are recruited and trained. On election day, observers take up positions at assigned polling stations and get ready to collect and report the data. This chapter deals with the next steps. The chapter begins with a discussion of data reporting protocols. It explains how and when observers report data on the quality of the process and the vote count results. Some of the problems associated with information flows on election day, and practical solutions to those problems, are discussed. The chapter then examines the important question of how the recovered data are used, particularly with regard to the vote count. It considers the main strategies for analyzing data on vote totals and steps taken to ensure that the results released will be reliable. It concludes with a discussion of how and when quick count findings can be released.1
DATA REPORTING PROTOCOLS
On election day, domestic observers usually make two reports. For the first report observers use a questionnaire similar to Form 1 illustrated in Chapter Six. Form 1 contains information about whether proper procedures have been followed during the opening of polling stations. This first qualitative report is made after the polling stations have opened, usually immediately after the first voter in line has voted. The second report comes from a form similar to Form 2, also illustrated in Chapter Six. This provides qualitative data on the voting procedures and the closing of the polling stations, as well as data on the vote count. The common practice is for observers to report these data immediately after polling stations have produced an official result. In most cases, a polling station result is “official” after the polling station officials and the party agents present at the count have signed the public document that records the vote totals for that particular polling station.
This chapter focuses primarily on the official vote data (Form 2), but there are broad issues of data reporting that apply to all observer reports. So, the place to begin is with general guidelines that apply to both the first and second reports.
For each report, observers make three separate calls; they report the same data to three different locations.
- Call # 1: Observers make the first call directly to the central data collection center.
- Call # 2: Observers make the second call to their assigned regional coordinator.
- Call # 3: Observers make the third call to a back-up network of private telephones in the capital city.
The main challenge of a quick count is to collect, deliver, assemble and analyze large volumes of information—and to do so reliably and quickly. Because the effectiveness of quick counts requires efficient information flows, it is important to have a very clear idea about exactly how election day information flows will work. In fact, there are two sets of information flows to consider. The first has to do with the information flows from observers in the field to the data collection center. Then there are the information flows within, and from, the data collection center. Both of these sets of information flows are monitored through the central database. In effect, it is through the database that information traffic can be directed in ways that maximize the efficiency of data recovery on election day.
Information Flows from the Field
The experiences of groups that have conducted quick counts provide two very clear lessons about information flows, and each of these has important logistical and analytic implications that need to be clearly understood.
First, on election day, there are very substantial fluctuations in the volume of information flows from observers in the field to the data collection center. The typical pattern, summarized in Figure 7-1, is based on real data gathered from a recent Latin American election. In that particular case, the election law required that polling station officials open the polling stations by 7:00 a.m. Observers were asked to be present at the polling station by 6:15, some 45 minutes before polling stations were due to open. They were asked to report their Form 1 data, the qualitative data, immediately after the first voter had voted at their polling station.
This pattern of fluctuations in the volumes of information is essentially the same for both the qualitative and the numeric data. At 7:00, the data collection center receives no information at all. Information begins to trickle in to the data collection center after the first thirty minutes, between 7:30 and 8:00. The earliest data to arrive come from the most efficient polling stations and where observers have easy access to telephones. By 8:30, the number of phone calls into the data collection center has increased dramatically, and by 9:00 that trickle has turned into a deluge. In this particular case, calls were arriving at the data collection center at a rate of some 55 calls per 10 minutes or 5.5 calls a minute. After that peak period, the volume of calls coming into the data collection center starts to fall off, and then it slows down dramatically.
These uneven information flows present a logistical challenge. The task is to develop a strategy that anticipates—and then effectively manages—the peak volume of information intake. At issue are two questions. Does the group have the communications capacity to accept all the calls during the peak period?
More critically, are there information bottlenecks or breakdowns that could lead to information losses? Information losses are extremely serious for two reasons. First, they amount to an unnecessary waste of organizational time and effort. The practical issue is clear; there is no point in recruiting and training observers and asking them to report data if the communications system does not have the capacity to receive the data. Second, information losses mean that the effective size of the sample is reduced, and for reasons outlined in Chapter Five, it is clear that reducing effective sample size means increasing the margins of error of the quick count results. More technically, it means that the usable sample becomes a less reliable basis for estimating unknown population characteristics.
The second lesson learned is that, on election day, information flows into the data center at uneven rates from different regions of most countries. (See Figure 7-2.) There is no mystery about why there are dramatic regional variations in information flows. Information from the capital cities nearly always arrives first, mostly because the communications infrastructure in capital cities is nearly always far better than in rural areas, and observer access to telephones is nearly always easier in capital cities than elsewhere. Information from rural and remote areas, by contrast, are usually the last data to arrive because communications infrastructure is typically poor, and observers often have to travel great distances to reach telephones or radios. These uneven regional distributions of information flows have both organizational and analytic implications.
Because we know ahead of time that information flows are likely to be uneven in these two respects, it is important to take steps that will both maximize and protect our effective sample by managing the information flows more efficiently.
Strategies for Managing Information Flows from the Field
Most groups plan to report quick count data to data collection centers by telephone, if at all possible. The sample size determines the total number of calls that will flow through the data reporting system on election day. The configuration and capacity of the telephone system has to be designed to manage the volume of information that is likely to come via telephone lines. More importantly, the telephone system has to be able to manage the peak volume of data flows. The following example illustrates how the volume of data is calculated.
- A quick count observation in one country uses a sample of 600 polling stations, and each telephone call takes, on average, about four minutes to transmit the observer information. This means that the volume of information to be transmitted is 600 x 4, or 2400, telephone line minutes. In an ideal world, it might be possible to design a communications system so that each data point in the sample would have its own dedicated telephone number (in this example, 600 telephone lines). This is not necessary; it is not very efficient, and it is very expensive. An alternative strategy is to (1) estimate what the peak volume of calls will be and then (2) design a communications system that has the capacity to manage the volume of information at that estimated peak load, in countries where this is possible.
Generally, the most efficient telephone system to use is what is called a “cascading” telephone number system. Here, observers are provided with one phone number to call, but that phone number will automatically transfer and re-route observer calls to the next available free line. Cascading telephone number systems may have as many as twenty lines dedicated to a single number. This system is most efficient because it decreases the likelihood that callers will get a “busy” signal when they call the number.
“One-number/one-line” systems are more common but far less efficient. First, they require more available numbers. Second, observers need to be provided with a list of alternative numbers to call in case the first telephone number they are assigned turns out to be “busy.” The onus is upon the observer to find an open line from the list of numbers. Unless the data center telephone numbers are carefully assigned to each observer, observers may face the problem of having to repeatedly call the same number until that particular line is open. This wastes valuable time. In “single-number/single-line” telephone systems, the more efficient practice is to have no more than fifteen observers assigned to the same data center telephone line and to provide each observer with a list of up to five alternative telephone numbers to call. If this strategy is followed, then it is important to rotate the order of the alternative numbers provided to each of the fifteen observers. Observers tend to use the first number at the top of the list of telephone numbers they are given, so rotating the numbers on these lists decreases the likelihood that each observer will be calling the same number at the same time. Careful planning is required to reduce the chances of having information bottlenecks on single-number/single-line telephone systems.
Installing large numbers of telephone lines in any one facility and doing so at short notice is often a challenge. For bureaucratic reasons, it may take a long time to order telephones and to have the lines installed. Or it may be just too expensive to buy, or rent, and install the needed number of lines. Even when it is possible to install the necessary number of land lines, they may not be dependable. For these reasons, alternative ways of delivering observer information to data collection centers should be considered.
Recall that a substantial proportion of the data reported early tends to come from observers who are located in the capital city region. If the data collection center is located in the capital city, then one alternative to consider is the possibility of hand-delivering observer data to the data collection center. For example, organizers might consider having volunteers on motorcycles pick up the data from observers at pre-arranged collection points and times throughout the city. If one third of a country’s voters (and so, about one third of the sample) live in the capital city, then using such an alternative data delivery system to complement direct phone calls can substantially reduce the information load on telephone lines and the number of required telephones.
Strategies involving the hand delivery of data, of course, are manpower intensive and require careful coordination and supervision, but they can be effective. In Malawi’s 1999 quick count, 16 vehicles rode circuits from three locations, picked up observer reports and delivered them to these locations. The forms were then faxed to a central data collection center.2
There are other alternatives to hard-wired telephones to consider. These might include the use of cell phones, solar phones, satellite phones, and radio and fax systems. Each alternative has its own combination of advantages and drawbacks.
In most developing countries, people do not have the luxury of entirely efficient and adequate telephone communications systems. For that reason it is important to evaluate the adequacy of the existing communications system well in advance. The next step is to calculate the load and distribution requirements for a quick count communications effort. And the final step is to strategically configure a quick count communications system around what is available, so that the system that can adequately manage the information load of the quick count. This may mean patching together a combination of communications avenues for the delivery of observation data.
Information Flows within the Data Collection Center
After observers have recorded quick count data at their polling station, they make their first telephone call directly to the data collection center. Figure 7- 3 illustrates the pathways of information flows at the data collection center. After the identity of the caller has been verified (by the use of a security code word or set of numbers), the call from the observer is accepted and the observer information is recorded by telephone operators at the data collection centers.
Precisely how these data are recorded depends on what kind of technology is available to the observer group. Where there is little access to technology, a pen-and-paper approach can be enough. Phone operators simply enter the phone data by hand onto forms. Where more sophisticated technology is available, observers calls may be directly routed through to the data entry facility where operators using headphones can enter the data directly into the database, while the observer remains on the telephone line. Keeping observers on the telephone line while the data are entered is more efficient, and it reduces data losses.3
Follow the pathways in Figure 7-3 indicated by the solid arrows that go from Call #1 through to data entry. Notice that immediately after the data have been entered, the information is routed directly to the database. The database accepts these observation data and stores the data within a subfile that is attached to a larger database. That larger database contains a great deal of information that is vital to the entire observation. It is by linking the newly received observer data with these other stored data that the database can be used to direct information flows instantaneously within the data collection center.
The Master Database
The master database, a computerized information storage place, can be developed during the very first phases of organizing for an election observation. In fact, the database should be developed from the moment when observers are first recruited. This database is an important basic resource that can be used for tracking recruiting and training, as well as for monitoring election day information flows. The database contains information, stored as records, for each and every volunteer observer. It usually includes: each observer’s name, address and contact telephone numbers; whether and when the observer has been trained; when the observer was sent election day observer materials; and when they received those materials.4 The database also contains the name, location, address and contact telephone numbers of the regional coordinators to whom the observer reports (Call #2), and it contains the same information for the backup private telephones to whom the observer will make Call #3. Most crucially, the database also contains the number and location of the polling station to which the observer is assigned.
With these pieces of information in a single computer record, the database becomes an extremely efficient tool for retrieving and linking key pieces of information. For example, recruiters can consult the database to track how well recruiting is proceeding. Trainers can refer to the database to find out who has been trained and how to contact people who need to be trained. The organization can use the database as a source of addresses for mailings to volunteers. Regional coordinators can use the database to keep in touch with observers who report to them and to identify those observers who are collecting data from the sample points in the quick count.
In addition to these general day-to-day operational uses, the database is an extremely valuable tool for guiding information flows within the data collection center on election day. Refer again to Figure 7-3. Notice that immediately after data from observers are entered by the data entry operators, the information is directly entered into the database. A computer program then re-directs the quick count observation data simultaneously to three locations: to the statistical analysis unit, the wall chart and the data recovery unit. In the statistical analysis unit, data become available for analysis. Volunteers working on the wall chart record which polling stations in the sample have reported in their data, and keep a running tally of the arrival of reports from the polling stations in the sample. Volunteers in the data recovery unit track each sample point that has NOT reported.
Sample Clearing and Data Recovery
Suppose that, after the first two hours, 20 percent of the sample points from the capital city have not reported. The vital question becomes: How to retrieve these data? The data recovery unit will take computer generated reports from the database and start the process of data recovery. Each computer generated report received by the data recovery unit will contain the following information from the database: the precise location of the missing sample point; the identity of the observer at that datapoint; the contact telephone number of that observer; the name and contact numbers of the regional coordinator for that datapoint; and the name and contact number of the back-up private phone contact for that datapoint. It might be difficult to contact directly the observers who are at the missing data point. They may still be at the polling station and out of telephone contact, and there are a number of possible reasons for why the data may not have been reported to the data collection center by the observer. The particular polling station might have opened late, and the observer may not yet have had the opportunity to gather the data. Another possibility is that the observer may have tried to call the data collection center while the data center phone lines were busy. Recall, though, that observers are required to follow a three call regime to report each piece of information. Call #2 should have gone to the regional coordinator and Call #3 to the back-up private telephone. So the data recovery unit can begin data recovery by phoning the back-up assigned to that observation point, or they can call the regional coordinator. If neither has received the data from the observer, the data recovery team alerts the regional coordinator so that she or he can investigate the matter. The regional coordinator directs efforts to determine the cause of the missing data, perhaps by involving a municipal coordinator to recover the data for the missing sample point.
The dotted lines in Figure 7-3 indicate the calls from the data recovery unit to the back-up private telephones and to the regional coordinators. The process of data recovery is a continuous one throughout election day. The sample clearance unit has the task of identifying missing data points and alerting the data recovery unit to the possibility that data may be missing for an entire province or state. These patterns require immediate attention because they suggest that there is a systemic problem in data retrieval. There may have been a breakdown in the observation communications system, or they could indicate a substantial and regionally specific problem in the administration of the election. Either way, the task of the data recovery unit is to determine the source of the problem and to alert the leadership about the scope and scale of any such problem. This information also has to be relayed to the analysis unit so that analysts are aware that possible adjustments may have to be made in the weighting of the data for the final report.
Evidence of data retrieval problems usually becomes apparent after observers have completed the task of reporting the Form 1 data, the first qualitative reports that observers call in immediately after the first voter at a polling station has cast a ballot.5 These Form 1 reports provide an early indication of where the observation effort is working and where it is not. The tasks of the data recovery unit are, first, to determine why there are the missing data points in the Form 1 phase of the observation, and second, to develop a strategy for reducing the missing data points for the crucial vote data that are reported in the second phase of the observation. It might be that data are missing from a sample point in phase 1 because an observer has fallen ill. Another possibility is that the observer’s cell phone batteries have gone dead. An observer may have been intimidated or refused entrance to the polling station by a poorly informed polling station official. Once the reason for the missing data point has been established, the regional coordinators can take steps to make sure that the problem is solved by the time that Form 2 quick count data are due to be collected. These corrective steps may entail assigning a back-up observer to the polling station, providing the observer with a new battery, or informing election officials to follow procedures to ensure that all observers are admitted to polling stations as entitled. Efforts to minimize missing data are vital because they increase the effective sample size and so reduce the margins of error in the vote count projection.
When the data recovery team recovers data for these missing sample points, the unit relays the new information directly to the data entry unit. As the recovered data are entered, they are cleared through the database, and they are automatically routed into statistical analysis and the sample clearance unit. This same procedure is replicated for each and every missing data point.
STATISTICAL ANALYSIS OF QUICK COUNT DATA
Analyzing quick count data is part art and part science. Certainly, the foundations—the sampling and the calculations of the margins of error—are grounded in pure science. But there are judgments to be made at several steps in the process of arriving at a final characterization about election-day processes. Observation data accumulate fairly rapidly on election day. It is not unusual to have as much as 30 percent of the total sample collected and digitized within 90 minutes of the opening of the polls. And as much as 65 percent of the total expected data may be available for analysis within as little as two and a half hours of the polls closing. After the digital entry of the data, the data are usually stored in a simple data file.
The primary role of the analysis unit is to develop a clear picture of the character of the election day practices by carefully examining election day observation data. With data from Form 1, for example, it becomes possible to determine the extent to which proper administrative procedures for opening the polling stations were, or were not, followed. It is the analyst’s job to ensure that the overall picture is an accurate and reliable one. That picture has to be developed one piece at a time.
The Initial Data Analyses
The very first data exploration undertaken by the data analysis unit has two goals. The first is to establish that there are no election day software or hardware problems that could interfere with the smooth flow of observation data through the entire computing system. The second goal is to scan the data for any early signs of substantive election day problems. This scanning, described in Chapter Six, involves data sweeps across all observer responses, on all items in Form 1, to determine if there are any unusual response patterns.
The Evolution of the Vote Count Results
Analysts simply do not have enough time to wait until “all the data are in” to analyze election day results. Indeed, it would be a very serious mistake to wait until all of the theoretical sample data have been reported by observers. No domestic observation group anywhere has ever succeeded in collecting 100 percent of the designed probability quick count sample. This presents a dilemma. The problem is that there is no way of knowing ahead of time exactly what size the effective sample will be. That being so, the standard practice is to repeatedly examine the data as they arrive and to continue to do so up to the moment when it can be clearly established that the data have reached the point where they are stable.
This “point of stability” is an important concept that underlies the evaluation of both qualitative and quantitative findings. Technically, the data are considered to have stabilized when the addition of new information from observers has no discernible, or material, effect on the results that have already been accumulated. In practice, this means that analysts watch the data findings evolve until the basic results, the distributions across the key variables, do not change. To establish a point of stability, analysts have to plan regular “takes” of the data, regular intervals at which additional pieces of the accumulating data are downloaded from the quick count database and analyzed.
There is no hard and fast rule about precisely what these intervals should be or how regularly these data takes should be timed. One of two criteria are usually used. The frequency of the data takes might be set according to timed intervals: Take 1(T1) might be 30 minutes after the polls have closed, T2 might be one hour after they have closed, T3 after one hour and a half later, and so on. Alternatively, the intervals for the data takes might be established according to the number of completed cases in the evolving dataset. So T1 might be analyzed after there are 100 cases in the dataset, T2 after 200 cases, and so on.
The usual procedure is for T1 to be early, perhaps after the first fifty sample points have arrived. The T1 data take serves two purposes: It provides an initial check on whether all the computer hardware and software are handling the data satisfactorily, and it provides benchmark data. The data from T2 are usually used to conduct initial data sweeps, to scan the data for unusual variations. Then, data from T3 through to Tn, are used to investigate in greater detail the origins, and possible causes, of these variations. At issue are a number of key questions. What is the scope of the problems? Are the problems randomly distributed or not? If the problems are not randomly distributed, then in what ways can the distributions be said to be non-random? And, does the non-random distribution of problems work to the material benefit of any party competing in the election?
Analyzing the Data by Strata
To this point, discussion has focused only on aggregate analysis; all of the available data are considered together as a single block of data. There are, however, compelling reasons to unpack the data when the vote count data (Form 2 data) are being analyzed. The standard practice is to divide the total sample into components (strata) and to examine, in detail and separately, the data from each of these different components. The strata, or segments of the total sample, that are commonly identified for this purpose often take the following form:
- Strata 1 – all sample points within the capital city;
- Strata 2 – data from sample points in all urban areas outside the capital city; and
- Strata 3 – the remaining points in the sample, from all rural areas in the country.
Strata may be defined differently in different countries. Capital cities are nearly always considered as a single strata for the simple reason that they are usually the largest urban population concentration in the country and they may contain as much as one third of the total population of the country (and so, one third of the total sample). The precise definitions of the other relevant strata require careful consideration. Selected strata should be relatively homogenous. For example, they might be defined by a regionally distinct ethnic or religious community in the country. They may have historically different political loyalties. Alternatively, strata might include a part of the country with a unique economy, such as a coastal region. For analytical purposes, however, it is rarely useful to identify more than four strata within the total population. Ideally, the strata should be of roughly equal size.
The strategy is to examine separately the evolution and sources of variation in the data from the capital city (Strata 1), separately from the data coming from urban areas outside of the capital city (Strata 2) and separately for data coming from rural and remote areas (Strata 3).
There are a number of reasons for analyzing the data using this stratification procedure. First, as has already been pointed out, data typically arrive at the data collection centers at different rates from different regions. Second, it is quite possible, and in fact quite likely, that different political parties will have different strengths and levels of citizen support among different communities in different parts of the country. Political parties often appeal to different class interests (e.g., the professional/business middle class or agricultural workers) and to different communal groups defined by language, religion, ethnicity or age. The point is that these communities, or interests, are hardly ever distributed evenly throughout the country. Those uneven distributions are usually reflected in regional variations in support for parties and in the evolution of quick count results. The following example illustrates this point:
- In one country, different parties have different levels of support within different demographic segments of a population. Consequently, shifts in the balance of support for political parties during the evolution of quick count results (T1 ….Tn) simply reflect what is technically called different “composition effects.” Party A may appeal to the young, and Party B to older citizens. If there are more young people living in the capital city, then “early” results from the quick count might show that Party A is ahead. These aggregate results change as data arrive from those parts of the country where there are higher concentrations of older people. In preparing for the analysis of quick count data, analysts should become familiar with what these variations might be. Census data, data from previous elections and knowledge of the historical bases of support for the parties are all useful sources for providing analysts with this kind of background information.
By analyzing the different strata separately, analysts can ascertain more reliably the point of stability. In fact, the most reliable, and conservative, practice is to analyze the data to determine the point of stability for each of the strata. Statistically, by following exactly the same procedures that are outlined in Chapter Five, it is useful to calculate what are the margins of error for each of the strata. With that calculation in hand, analysts can determine what are the minimum number of data points required within each strata to satisfy a margin of error of, say, 1 percent for each of the strata. Using that guideline, analysts can determine quite precisely just how many sample points are required from each strata for the data within that strata to stabilize. When the point of stability is reached for each of the strata, then the addition of new sample data will have no impact on the distribution of the vote within each strata. Once the data have stabilized within all strata, the addition of new data cannot change the distribution of the vote for the country as a whole. The aggregate result, after all, is the sum of the stratified results. Figure 7-4 provides a graphic summary of how vote counts aggregately “stabilize” during an analysis of data from “takes” T1…Tn.
Notice in Figure 7-4, that the early results (T1, T2 and T3) show considerable variation in the distribution of support for Party A and Party B. That variation can be explained by a combination of factors. First, the data that arrive first come from the capital city, and support for Party A is higher in the capital city. Second, the effective sample, at T1, is very small, and it produces estimates that are both biased (capital city results) and have high margins of error. By T4, as the effective sample size increases, the differences in the balance of vote support for the parties is declining. At T4, Party A and Party B are in a close battle, and Party B appears to be catching Party A. By T5, Party B’s popular strength in the rural areas is beginning to show. The effect is to place Party B ahead of Party A, and by T6 the data appear to have stabilized.
Projecting the Election Result
On election day, domestic observation organizations come under intense pressure to “call the election,” to release quick count results on the vote projection as early as possible. It is sometimes argued that such an early projection is important because it will help to contribute to political stability. These pressures may come from the media who are anxious to break the news and to meet their deadlines. Pressure may come from organizations that fund the observation effort and which feel entitled to get the very earliest results first. Pressure may also come from within the ranks of the election observation group, perhaps from those who want to see the group be the first to release results or from those who worry that to release the data late will make the observation efforts irrelevant. Typically, pressure to release projections of electoral results as soon as possible comes from all of these sources.
The analyst’s priority, however, must be a commitment to ensure that any data that are released are only released after it has been clearly established that the data are accurate and reliable. In fact, it is clearly a very serious mistake to release data that have not been thoroughly checked. The consequences of releasing unreliable, or worse yet incorrect, data can be disastrous. The release of very early, or preliminary data, can be both misleading and counterproductive, and the effect may be to undermine the legitimacy of the quick count and the entire observation effort. There are very strong reasons, then, to exercise caution. All of the results should be re-checked even after the data have apparently reached the point of stability.
The following checks on the data are now standard, and they help to increase confidence in the election observation findings:
- Voter turnout rate—Recall from Chapter Five that the efficacy of the sample depends partly on assumptions about levels of voter turnout. Previous elections provide a record of what the typical voter turnout rates for the country have been. Recall that information about the typical level of voter turnout is usually used to inform analysts about the estimated sample size. Voter turnout is factored into calculations about the margins of error. There is no way to predict what turnout rates will be before election day, but Form 2 quick count data will provide a real measure of actual voter turnout on election day. So, the analytic questions to examine are: was the voter turnout rate in this election higher or lower than average, and does the voter turnout rate in the election meet the assumptions used in the original calculation of the margins of error? If the turnout rate meets, or exceeds, the levels assumed in the calculation of the margins of error, then there is no problem. But, if the voter turnout is lower than expected, the margins of error have to be recalculated, and the new criterion has to be applied to the stabilized data. A lower than expected turnout may mean that the effective sample size has to be somewhat larger than originally anticipated, and that might mean delaying the announcement of a result until the minimal criterion is satisfied.
- Rogue data—In nearly all election observations, there are findings that are difficult to account for and which apparently indicate that, to some extent, procedural requirements for the administration of the election may have been violated. In some instances, these “findings” might be attributable to something as simple as errors in data input, which can and should be corrected. In other cases, there may be genuine rogue results. If, for example, quick count data show that 757 votes were recorded at a particular polling station when the allowable maximum for each polling station is 600 votes, then this rogue result should be documented and investigated. If the number of rogue cases is large, then there may be reasons to question the legitimacy of the count. The prudent strategy is to conduct a late sweep of the data to identify the scope and scale of “outlying data results” and to do so before the quick count results are released.
- Missing data—Even though the data on the vote count may have stabilized by T6, as in the example in Figure 7-4, it will almost certainly be the case that 100 percent of the entire sample will not have reported. Missing data require the attention of analysts. The analysts must determine how the missing data are distributed across the sample. If the missing data are distributed relatively evenly between the various strata (capital city, urban areas outside of the capital, and rural/remote areas), then it is unlikely that the addition of these data to the sample will have a material effect on the outcome predicted by the stabilized data. The problem is that missing data tend not to be evenly distributed throughout the effective sample. Data from rural/remote areas are usually more likely to be missing than are data from the capital city region. In that case, it is prudent to run an analytic check to determine what the overall result would look like if there were no missing data. That can be done by analyzing the differences in vote distributions for the competing political parties within each strata and then supplementing the stabilized data with weighted missing data. The weights are determined arithmetically simply by the proportional distributions of missing data across each of the strata. For example, if in the rural areas Party B’s support is greater than party A’s by a ratio of 6:4, and 50 percent of the missing data are in the rural areas, then all that is required is an adjustment of the stabilized results that allocates additional votes to Party B and Party A by a ratio of 6:4 for those missing cases. The same procedure is followed for each of the other two strata. This weighting procedure is a technical adjustment to the stabilized data from the effective sample. For statistical reasons it is clear that, if the minimal limits for each of the strata have been satisfied, then it is highly unlikely that such adjustments would have any material effect on the outcome of the election. Nonetheless, the weighting procedure and the technical adjustment produces a statistically more accurate quick count result.
- Projecting a close race—The most difficult circumstances facing quick count analysts are those that arise from a very close competition between rival political parties. Under these conditions, it is particularly important for analysts to resist any pressure for the early release of quick count results and to concentrate on the main task of accumulating as much data from the sample as possible. At issue is the margin of error of the effective sample. If the stabilized results show that the votes for the main contestants for office (Party A and Party B) are separated by less than the margin of error of the effective sample, then the quick count results cannot statistically project who the winner should be. That same principle can be expressed as a more positive rule of thumb: quick count data are reliable and can be released when the data within each strata have reached the point of stability, and when the difference in levels of voter support for rival political parties exceeds the margins of error of the effective sample.
Careful analysts will work through all of the above checks before coming to their conclusion.
Most observer groups now routinely work with sufficiently large random samples that they are unlikely to face the problem of elections that are statistically “too close to call.” Even under these unlikely circumstances, of course, domestic observer groups have a vital role to play. In these situations, they should promote and monitor a comprehensive and completely transparent vote count by election authorities, as well as the impartial and expedited resolution of any electoral complaints.
Moreover, analysis of the quality of voting and counting processes (together with analysis of the broader electoral environment) can help determine whether official results are to be accepted as credible.
*All content is pulled from NDI’s “The Quick Count and Election Observation”, and more details on this section can be found here.
1 Readers should refer to Chapter Six, The Qualitative Component of the Quick Count, for more detailed information on how qualitative data are collected and analyzed.
2 See Appendix 10 for additional information on the Malawi data collection process.
3 These types of direct data entry systems are far more efficient because built-in software safeguards alert data entry personnel to “illegal” responses to categories in observation forms. Keeping the observer on the telephone line during data entry reduces inaccuracies and eliminates the time consuming, and sometimes futile, task of trying to re-contact observers to resolve inconsistent or illegible responses that often appear in hand copied forms.
4 The database may track additional information concerning the organization’s staff and volunteers of various types, such as skills or types of tasks performed during the course of the election observation (e.g., types of pre-election monitoring undertaken) and interests/activities beyond election monitoring (e.g., voter education, “congress watch,” etc.).
5 Chapter Six, The Qualitative Component of the Quick Count, details the content and reporting procedures for Form 1.