Observing the Scottish elections, May 2007

The Open Rights Group released, on 20 June 2007, its observers' report on e-Voting and e-Counting.

I was one of the ORG's election observers, watching the election itself, and particularly the electronic count, in Glasgow. I've included my own report for ORG here as well. The comments below, as well as those in the report, are mine, and not endorsed (of course) by ORG.

My own impressions of the elections were somewhat less pessimistic than the overall ORG ones. I think that's mainly because I saw only e-counting (e-voting, I think, is fairly obviously nuts), and because the Electoral Commission in Scotland, and the organisers of the count in Glasgow, seemed pretty well-organised, and actively concerned to make the whole process as transparent as possible. Below, I've included some fairly informal supplementary remarks to my report, some of which disagree mildly with the ORG conclusions.

I came into this very sceptical about the security of electronic counts; I'm a lot less sceptical than I was, though I still think there are very significant problems, albeit with apparently simple solutions.

My impressions

It worked!

...very much to my surprise. I don't know whether Glasgow was lucky or particularly well-organised, but the Glasgow count seemed to go very smoothly, with results appearing within spitting distance of the LA's four-hour estimate. I know there were software and hardware problems in other (Scottish) centres, so the system contains at least scope for disasters, but can also manifestly be made to work.

The English counts seem to have been much more problematic; see the ORG report for the gruelling details.

Enabling STV

One of the better arguments in favour of e-counting is that it makes intricate STV counts feasible, and since I'm rather an enthusiast for STV I tend to feel that an electronic count is therefore acceptable. The fact that the Northern Irish local authority (LA) elections are counted by hand complicates this argument, but since the Northern Irish STV system is substantially simpler than the Scottish one, this remains at least an equivocal indicator that e-counting is necessary.

Statistics

The process of handling hundreds of thousands of loose sheets of paper, getting them into the one room, and not losing or mis-counting too many of them on the way, is a lot harder than you might expect. In retrospect this is pretty obvious, but it struck me pretty forcibly when I realised that much of the drill in the polling stations and at the count hall was pretty similar in outline to a traditional election, even though most of the details were substantially different.

For example, at one point in the protocol, clerks compare the number of papers in a ballot box as counted at the polling station, with the number counted by the scanner, and if the difference is in the range +1 to -3, then that's passed as close enough. That seems odd, but it turns out that this range is used because that's the margin of error that's been found, in the past, with traditional elections, to be consistently achievable.

Now, it might be the case that this error budget should be reexamined now that scanners are in the mix, but even without that, this range is useful for two reasons. Firstly, it reminds us that traditional elections did in fact have an error budget, estimated by experienced personnel, and secondly that this budget (which, at approximately ±2 in a box containing around 400 papers, comes out at ±0.5%) sets a rough scale for the various processes involved in the electronic count. The overall error in the result would be a combination of a few error sources, but if they're all around this scale, then the overall error wouldn't be massively greater than this: crudely, a process with errors no bigger than 0.5% is, in this respect at least, no worse than a traditional election.

It's probably also worth emphasising that in a context like this, small systematic errors (that is, bias) are more important than larger random errors. The smallest winning margin in the parliament elections was Cunningham North at 0.2%, but the next lowest was 1.6%, so the process as a whole could probably withstand a systematic error of order 1% without serious difficulty. Since there are multiple contests, the process could arguably withstand a random error even larger than this without changing the overall Scotland-wide result (in fact, the SNP won control of the Parliament by a single seat).

I mention this, not to suggest that we shouldn't care about errors in the count -- of course we should, and should be aiming for a figure well below this nominal 1% scale -- but to suggest that the whole process is reassuringly far from meltdown. The ballot-handling process is such that it would be very hard, I believe, for any individual to create a bias anywhere near as high as 1%. What could create such bias, however, are the black boxes comprising the scanners and counting software, and that's where the real problems lie.

Black boxes

The main new loci of error, and the potential sources of accidental or deliberate biases or random errors, are the two black boxes in the election: the scanners, and the counting machine.

The standard response to this is use open-source software, and have the code publicly reviewed. I'm no longer convinced that's as easy an answer as it sounds. As anyone will know, who's had to manage a network, printers and scanners are swines, always failing in the most irritating ways at the most inconvenient times, and it would be decidedly non-trivial to integrate the various required bits of hardware and software into a complete system that would work at full capacity from precisely 22.00 on the appointed day, with minimal opportunities for testing, while everyone was watching. I wouldn't want that job.

OK, then, lets leave the messy system integration to contractors, and have the important bits of publicly assured software running in assured hardware, with the Returning Officer formally responsible for booting them from assured media..., and so on. But I don't really see this working either, as it points towards a more complicated protocol, and more, and more disparate, bits of hardware, increasing the complexity of the systems integration problem, and so directly decreasing its reliability.

The only way out of this, I think, is by public testing of the live system. Let the contractors implement the system however they like, subject only to the requirement that they emit machine-readable logging information at appropriate points, in particular the system's decision about each and every ballot paper. This would mean that:

  1. Ballot boxes could be chosen by the Returning Officers at random while the count was going on, counted by hand, and compared with the set of decisions logged by the system. It would be a straightforward statistical calculation to work out how many boxes would have to be sampled in order to detect a given level of random or systematic error.
  2. The ballot paper data can then be counted by multiple independent algorithms, and compared with the contractor's calculation of the overall contest result. This data could be made public (there was a Scottish Executive consultation on this; I don't know the outcome, nor really understand why this should be a major problem), or if that is problematic for some reason, the calculation verified by some manifestly public mechanism either during the count, or soon enough after it to support a challenge.

Conclusions and recommendations

These are the summary conclusions I came to at the end of my report. The paragraph and section references are to the report.

1. Overall

The count I observed in Glasgow was very successful, from the point of view of both the ballot-handling protocol and the mechanical aspects of the counting technology. There have been media reports of technical problems at some other counts.

...all very much to my surprise. I don't know whether Glasgow was lucky or particularly well-organised, but the Glasgow count seemed to go very smoothly, with results appearing within spitting distance of the LA's four-hour estimate. I know there were software and hardware problems in other (Scottish) centres, so the system contains at least scope for disasters, but can also manifestly be made to work.

2. Problems

The election as a whole was marred by a number of problems. The design of the ballot papers seems to have caused a large number of spoiled votes (Sects. 3.3 and 4), and there was a pre-election controversy about the handling of the large number of postal ballots.

The design of the ballot papers does appear to have been botched, not least by using a market-research consultancy to evaluate the designs, rather than usability experts. However, I'm just concerned with the e-counting, here.

3. Protocol

The protocol for the handling of ballot papers seems at least as secure as the traditional system (para. 35 and following, TOR 1), with similar tradeoffs between security and usability. There seems to be a claim that the format of the UIMs helps prevent forgery (para. 50); this should be examined in more detail than is available here (TOR 2).

The process of handling hundreds of thousands of loose sheets of paper, getting them into the one room, and not losing too many of them on the way is a lot harder than you might expect. In retrospect this is pretty obvious, but it struck me pretty forcibly when I realised that much of the drill in the polling stations and at the count hall was pretty similar in outline to a traditional election, even though most of the details were substantially different.

For example, at one point in the protocol, clerks compare the number of papers in a ballot box as counted at the polling station, with the number counted by the scanner, and if the difference is in the range +1 to -3, then that's passed as close enough. That seems odd, but it turns out that this range is used because that's the margin of error that's been found, in the past, with traditional elections, to be consistently achievable.

Now, it might be the case that this error budget could be reexamined now that scanners are in the mix, but even without that, this range is useful for two reasons. Firstly, it reminds us that traditional elections did in fact have an error budget, estimated by experienced personnel, and secondly that this budget (which, at approximately ±2 in a box containing around 400 papers, comes out at ±0.5%) sets a rough scale for the various processes involved in the electronic count. The overall error in the result would be a combination of a few error sources, but if they're all around this scale, then the overall error wouldn't be massively greater than this.

It's probably also worth emphasising that in a context like this, small systematic errors (that is, bias) are more important than larger random errors. The smallest winning margin in the parliament elections was Cunningham North at 0.2%, but the next lowest was 1.6%, so the process as a whole could probably withstand a systematic error of order 1% without serious difficulty. Since there are multiple contests, the process could withstand a random error substantially larger than this without changing the overall Scotland-wide result.

4. Scanning

The scanning and OCR technology used appears to be conservative, increasing confidence that it has a low error rate (para. 82, para. 85, Sect. 5.2). Efforts have been made to assure the algorithmic correctness of the counting program (para. 110) (TOR 2).

I ended up persuading myself that the scanners were probably conservative, though there are arguments in the ORG report that this may be optimistic.

5. Systems

The support systems for the count appear well-designed in general, but there are some minor opportunities for improvements (para. 48).

6. Risk to Secrecy

There is no obvious additional risk to the secrecy of the ballot resulting from the electronic counting (TOR 3), other than whatever risk is associated with the retention of ballot data discussed in para. 113.

Note, just in case it needs saying, that this is a very different thing from suggesting that there is no problem from electronic voting, which is a very different nest of problems indeed.

7. Necessity for validation

The principal difference between this and a traditional election is the presence of software, and the fundamental difficulties of observing this, closely enough to detect accidental or deliberate malfunctions (TOR 1 and 2). It is absolutely necessary to make manifest the integrity of the system, and a possible and inexpensive means of doing this, by releasing intermediate steps in the count, is discussed in Sect. 5.5.

I believe this validation can be more effectively done by testing outputs, rather than by formally validating the hardware and software involved.

Norman Gray
11 May 2007