Multi-stage Hay voting

Table of Contents

1 Multi-stage Hay voting

1.1 Body

1.1.1 Introduction

  • Who invented Multi-stage Hay voting?

    Multi-stage Hay voting was invented by me, Tom Breton (Tehom). It uses Peter de Blanc and Marcello Herreshoff's Hay Voting as a building block.

  • What it is

    Multi-stage Hay Voting is a voting system. That is, it is a way of deciding an election based on ballots that people have cast.

  • What it's good for

    Multi-stage Hay Voting has the truth-revealing property that Hay voting has, or nearly preserves it, and solves the drawbacks.

1.1.2 About (plain) Hay Voting

Hay Voting is a voting system invented by Peter de Blanc and Marcello Herreshoff.

  • Advantages

    Hay Voting has the unusual property that it is a truth-revealing protocol. That is, any voter's best strategy is always to vote sincerely. (Arrow's impossibility theorem notwithstanding). Multistage Hay voting does not entirely preserve this property, but is able to mostly defang strategic voting.

  • Drawbacks
    • Hay voting relies on starting with a "clean" slate of candidates. It can easily be thrown out of whack by:
      • candidates that are clones of other candidates
      • candidates that almost everyone strongly dislikes
      • other distortions of the slate of candidates.
    • Hay voting, because it is dominated by its random element, also runs a severe risk of making choices that nobody likes.
  • How it works

    For now this links back to Peter de Blanc's intro to Hay Voting.

  • Why Hay voting doesn't run afoul of Arrow's impossibility theorem

    Kenneth Arrow's impossibility theorem states that no voting system can have all of the following properties at the same time:

    Monotonicity
    If more voters like a candidate A, this doesn't hurt candidate A (could help him or could make no difference)
    Independence of irrelevant alternatives
    If a voter prefers to vote for candidate A over candidate B, introducing another candidate should not make him now prefer B over A.
    Deterministic
    Non-random. Given the same inputs, always runs the same way and produces the same outcome.
    Non-imposition
    Doesn't pick winners by any arbitrary criterion such as alphabetical order. More formally, every outcome should be achievable by some set of votes.
    Non-dictatorship
    Does not simply use the preferences of one voter as the collective outcome.
    Unrestricted domain
    The voters preferences can range freely over the candidates. For instance, it is not assumed that every voter who likes A better than B also likes B better than C.
    Multiple candidates
    Not restricted to two candidates per election.
    Complete outcome
    Generates a complete ranking of the candidates.

    Hay voting escapes criterion #3 because it incorporates a random element.

1.1.3 How Multi-Stage Hay voting works:

  • Start with any slate of candidates, as "dirty" as you please. There's no special procedure for selecting them.

    diagrams/original-slate.png

  • Now use Hay Voting to choose a candidate. But this candicate won't be elected, it will just be used in the next round.

    diagrams/slate-after-hay.png

    Repeat this procedure N times …

    diagrams/slate-after-hay-2.png

    …to select N candidates. N should be many times larger than the original slate of candidates.

    It is OK if the same candidate is chosen more than once. If a candidate A is chosen twice in round 1, A has two slots in the slate for round 2.

    diagrams/slate-after-n-hay.png

  • Now do the same to the new slate of candidates.
    • Recalculate each voter's voting mass redistributions in accordance with his original preferences and the new slate of candidates. It is easy to do so optimally, because the optimal allocation is to give each clone of a given candidate the same transfer of voting mass.

      diagrams/slate-next-iteration.png

      This new distribution of voting mass will apply to the new slate of candidates.

      diagrams/slate-next-iteration-2.png

      NB, even though we still use the original voting preferences, the probable outcomes are different because the slate is different. With Hay Voting each "slot" on the slate, whether it's a clone or not, in general has a non-zero probability of being selected.

    • As before, pick a vote at random. Using the new voting mass redistributions, pick a candidate.

      diagrams/slate-next-iteration-3.png

    • As before, repeat this procedure N time to make the next slate of candidates.

      diagrams/slate-next-iteration-4.png

  • Repeat this procedure many times, each time using the slate of candidates generated in the most recent round.

    With near-certainty, the genuine favorites will come to dominate the slate.

  • At some point, terminate and pick a winner. Termination could be handled several ways:
    • Repeat until only one candidate survives. This is simple but likely to take a very long time. When there is more than one seat at issue, this can create ties.
    • Repeat until one candidate dominates by a predetermined ratio. This is vulnerable to bad initial slates.
    • Repeat until one candidate both dominates by a predetermined ratio and has gained clones in a predetermined number of successive rounds.
    • Repeat a fixed number of times and choose the candidate with the most clones in the final slate, breaking ties any way you like.

      I suggest breaking ties by plain Hay voting, just for consistency's sake.

I suggest that a flattening rule be used in all rounds except the last, in order to make strategic voting pointless.

If greater certainty is wanted, use larger N and more lenient stopping conditions.

The stopping conditions generalize for when there is more than one seat at issue.

1.1.4 About flattening

  • The problem that flattening solves

    Without flattening, strategic voting can still happen. Consider a situation where other voters' collective preferences are:

            A > B > C
    

    and your preferences are:

            B > A > C
    

    and your vote is among those chosen on the next-to-last round. You'd do better if A is not in the running, so that it would be a contest between B and C, which B would likely win.

    Does it make sense to insincerely vote as if your preferences are:

            B > C > A
    

    Answer: As shown in the analysis of 2-round voting, it sometimes helps you a little, but you are wasting your allowance moving voting mass from A to C. You do better by spending all your allowance helping B. That is, you flatten the difference between A and C, as if your preferences were:

            B > {C,A}
    
    • What that means

      In Multi-Stage Hay Voting without flattening, a strategic voter can sometimes realize his sincere preferences better by concentrating his vote on his out-of-step preferences and letting the other voters "carry the load" for common preferences. This is sub-optimal in the last round but may make up for it in other rounds.

      If this flaw was left unfixed, Multi-Stage Hay Voting would not be a truth-revealing protocol.

    • Computing a voter's optimal flattening

      The optimal flattening for a given voter is a function of:

      • His sincere preferences
      • All voters' preferences collectively
        • More specifically, a vector of the total probability mass across candidates, as computed for the last round.

      It is not, as I once thought, a function of how many seats are available in the election. Any election has a dual election with "anti-seats", as it were. The number of anti-seats generally doesn't equal the number of seats.

      I haven't analyzed it in detail, but I suspect the optimal flattening for a given voter can be computed something like this:

      • Represent the voter's preference set as a unit vector from the origin.
      • Represent the total probability mass across candidates also as a vector from the origin. That vector defines a line.
      • The component of the voter's preference vector parallel to that line is his in-step preferences.
        • The magnitude of this component and the next may be relevant for Probability weighting below.
        • NB, this is the same up to a scalar for all voters.
      • The perpendicular component is his out-of-step preferences, aka his flattened preferences.
  • Preliminary idea

    A simple approach, which we will see is incomplete, is to automatically flatten the voter's preferences for him. Considering just two rounds, the procedure is:

    • In round #1, use a voter's flattened preferences.
    • In round #2, use a voter's expressed preferences, unaltered.

    Here a strategic voter's optimal strategy is still to vote his sincere preferences.

  • Problems with the preliminary idea

    But this simple solution has problems:

    • It doesn't scale well to more than 2 rounds. In a third-to-last rounds, strategic voters would take into account the collective voting behavior of the second-to-last round, which is generally different than the collective voting behavior of the last round. And so forth for earlier rounds.

      This would complicate both our analysis and our mechanism for transforming sincere preferences into strategically optimal votes.

    • The flattenings would severely distort the outcome. The collective out-of-step preferences would be chosen too often and the common collective preferences would not be chosen as often as they should be.
  • Compensating for the distortion

    To combat these problems, let's modify the flattening idea slightly. Now it should work like this:

    For rounds other than the last, compute two sets of preferences for each voter:

    • The flattened preferences, as described above. Aka his out-of-step preferences.
    • The remainder of his sincere preferences. The vector sum of this set and the other is the voter's original preference vector.

    Using these two sets of preferences, construct two ballots, each ballot realizing one of the sets of preferences.

    When we use a given voter's vote, we randomly pick one of the two ballots and use it.

    NB, this is different than splitting the voter's ballot into flattened and remainder parts, which would accomplish nothing. Here the split occurs first and the preferences-to-ballot transformation occurs afterwards.

    Thus for any round, the overall tendency is essentially the same as the overall tendency of the last round.

    This has the welcome benefit that the analysis for the next-to-last round holds for all rounds other than the last. With a scaling factor due to the number of remaining rounds, I think, but basically the same.

  • Probability weighting of the two ballots

    If we were to let the probability weighting be constant, or constant across voters, a voter could still game the system by exaggerating his out-of-step preferences.

    So the likelihood that we choose the flattened ballot must vary. The more it's out-of-step, the more likely that we choose the in-step ballot instead of it. The less likely that it's out-of-step, the more likely that we choose it instead of the in-step ballot.

    This likelihood might be computed from the relative magnitudes of the voter's out-of-step and in-step preferences as computed above. It's not yet clear what the right function is from relative magnitudes to likelihood.

    If a voter:

    • overstates his out-of-step preferences, the out-of-step preferences appear larger, but are less likely to be chosen. This does not benefit him.
    • understates his out-of-step preferences, they are more likely to be chosen, but accomplish the opposite of what he wants. This does not benefit him.
  • Can ballot flattening be used strategically?

    We've said that a flattened ballot is a function of, among other things, other voters' collective preferences. This implies that a voter has another potential means of affecting the outcome. He could vote in such a way that when other voters take his preferences into account (automatically by the mechanism above), they cast sub-optimal votes. His hope would be that the other voters would be less effective at opposing his preferences.

    Does it ever make strategic sense for a voter to do this? Does this strategy ever realize any voter's preferences better than sincere voting would? I suspect not, for these reasons:

    • With the modified flattening design above, the flattened ballots are only half of the picture. The in-step ballots roughly compensate for them. So sub-optimal flattening should have little effect on the collective decision.
    • Even considering only the flattened ballots, I suspect that a voter always loses more by casting his own vote sub-optimally than he gains by making others cast sub-optimal votes, but I haven't proved it.

1.1.5 Issues

  • Can clones in the original slate cause an unfair result?

    Could an unfair situation in the original slate result in an unfair result? Asking the same thing in a more precise way, suppose we arbitrarily clone some candidate. How likely is it that his advantage in the first slate will persist thru all the rounds?

    diagrams/dirty-slate.png

    Let's look at an example of a fairly bad case. Suppose that the first slate consists of 99 indistinguishable candidates A and one alternative B, and that voters modestly favor B over A by 60/40, that is, 60% of voters are 100% on B's side, 40% are 100% on A's side. Also let N = 1000.

    Maximizing their voting power, the pro-A voters can give B a 0% chance, treating the A clones as indistinguishable. The pro-B voters can each transfer 1/99 units of voting mass from each A candidate to B, doubling B's voting mass to 2 sqrt(99). Since total voting mass is constant, this doubles B's chances to 2/99, 0.0202.

    CandidatenumbershareshareGiven pro-B,Overall
    ofofof voterschance tochance to
    clonesslatefavoringbe pickedbe picked
    A990.990.400.97980.9879
    B10.010.600.02020.0121

    With N = 1000, a pro-A voter is chosen about 400 times and a pro-B voter is chosen about 600 times, (The chance that the number of pro-B voters chosen differs from 600 by more than a few percent is negligible)

    The pro-A's give all their votes to A, and the pro-B's, hamstrung by the unfavorable slate, give an average of 0.0202 slots to B; in total B gets 12.12 slots on average. B has increased his share of the slate and is likely to do so even faster in subsequent rounds.

    Candidatenumber ofshare ofChange in share
    clonesslateof slate
    A987.870.9879- 0.0021
    B12.120.0121+ 0.0021

    Could B be eliminated in the first round? The chance that B has been eliminated is small. For him to be entirely eliminated, every single voter would have have had to have the A result, a 0.9879 chance each, so given 600 pro-B voters it's a 5.054e-06 chance, which is negligible. The chance that candidate A receives only one slot is still only 6.707e-05, and there is only about a 1.8% chance that A receives fewer than 5 slots. The expected distribution is actually fairly narrow; its quartiles are at about 11 and 14.

    See the appendix for Elisp code to calculate and print the probabilities.

    Could B be eliminated in later rounds? In subsequent rounds, B is very likely to have more cloneshare than he did in the first round. There is only a 23% chance that his cloneshare has decreased at all.

    Additionally, because we chose N larger than the original slate, even if B's cloneshare has actually been reduced somewhat, his situation is not even as dangerous as in the first round. Pro-A's are no longer able to entirely exclude B and pro-B's have more leverage because they have a number of B's to transfer voting mass to - when B's cloneshere is small, pro-B's power increases by the square root of the number of B clones.

    The dynamic is much like a biased random walk on a N-1 hyperplane-like curve embedded in an N dimensional space. The curve has asymptotes at the diagonal axes for which just 1 dimension is positive. For N=2, it's a diagonal line of slope -1 passing thru the origin.

    The walk's position vector is approximately the log-odds of each candidate's cloneshare, except that there is quantization near the endpoints. That implies that the drift goes as exp(sqrt(M)) while the non-random movement goes as exp(M). Given large enough M and N, it looks like we'll almost certainly get the right winner.

    It may make sense to re-vote part-way thru the process, because with a large slate of candidates, it is difficult for voters to form a precise opinion on each, but later when there are fewer candidates voters can pay more attention to individual candidates.

  • Resemblance to STV

    The resulting action in rounds other than the last superficially resembles STV. A voter virtually bullets-votes for their favorite and then when the favorite is eliminated, bullet-votes for their next favorite, until the election is resolved. It differs from STV in that

    • the bullet-vote is only a tendency, and a fairly weak one.
    • the intervening slates are larger.

1.2 Appendixes

1.2.1 Elisp code to calculate and print the probabilities (Revised)

;;Since we sample homogeneously from a mixture of pro-A and pro-B, we
;;can just add the probabilities.
(defconst hay-example-chance-of-B 
   (* (/ 2.0 99.0) 0.6)
   "The chance of selecting B in any sample" )

(defun hay-example-600-1000 (n)
   "Calculate the probabilities related to the Multistage Hay example.
1000 voters, 

N should be a non-negative integer and not too big."

   (*
      (/ 
         (let
            ((v (float 1)))
            (dotimes (i n v) (setq v (* v (- 1000 i)))))
            
         (let
            ((v (float 1)))
            (dotimes (i n v) (setq v (* v (1+ i))))))
         
      (exp
         (+
            (* (log hay-example-chance-of-B) 
               (float n))
            (* (log (- 1.0 hay-example-chance-of-B)) 
               (float (- 1000 n)))))))

(let
   ((cumul 0))
   (dotimes (i 30)
      (princ i (current-buffer))
      (let
         ((prob (hay-example-600-1000 i)))
         (setq cumul (+ cumul prob))
         (print
            prob
            (current-buffer))
         (princ "Cumulatively," (current-buffer))
         (print
            cumul
            (current-buffer)))
            
      (princ "\n" (current-buffer))))

1.2.2 Analysis

Analysis of the effect of moving one unit of cloneshare from one candidate to another.

We'll analyze it as if there were 3 candidates, X, Y, Z. Z is a third candidate that is only indirectly affected. Z can stand homogeneously for all the remaining candidates, except as far as Z's internal heterogeneity affects the scaling factor, which is not much.

  • Some conventions

    Amount of cloneshare that candidate X would have in the sincere slate:

    #X

    This notation #X will only refer to the cloneshares of the sincere slate; cloneshare in the strategic slate will be indicated by appropriate formulas in terms of #X.

    The utility function U will be indexed by crowd or you where it's ambiguous.

  • What happens

    Replacing one X clone with a Y clone has the following effects:

    • Delete the transfer between the removed X and all the Z's
    • Delete the transfer between the removed X and all the Y's
    • Create a transfer between the new Y and all the X's
    • Create a transfer between the new Y and all the Z's
    • There is one less X to be chosen.
    • There is one more Y to be chosen.

    And renormalize.

  • Magnitude of transfers

    All transfers between to an X from all the Y's follow the formula:

    U(X-Y)crowd #Y

  • Probability

    The probability that candidate i is chosen by a member of the crowd on a single Hay vote is:

            P(i) = 
                    (sqrt(n - 1) + votingmasstransfer(i)) 
                    / n sqrt(n - 1)
    
                  = (sqrt(n - 1) + c total\_utility\_delta(i)) 
                    / n sqrt(n - 1)
    

    We'll define P'(i) proportional to P(i) to eliminate the denominator and to group c with the constant term:

            P'(i) = 
                  = P(i) n sqrt(n - 1 / c
                  = (sqrt(n - 1)/c + total\_utility\_delta(i))
    

    We will index this on new and old.

    We'll also define:

            inv\_c\_delta =:  sqrt(n - 1)(1/c_{new} - 1/c_{old})
    

    This quantity is likely to be negligible, but we'll keep track of it just in case.

    The individual probabilities:

    • Effect on X's probability

      X's probability is affected just by:

      • Create a transfer between the new Y and all the X's
      • There is one less X to be chosen.

      So

              total\_utility\_delta(X)_{new} = 
                      total\_utility\_delta(X)_{old} + U(X-Y)
      

      X's new probability is:

              P'(X)_{new}
              = (#X - 1) (sqrt(n - 1)/c_{new} + total\_utility\_delta(X)_{new})
              = P'(X)_{old} 
                      - #X (sqrt(n - 1)/c_{old} + total\_utility\_delta(X)_{old})
                      + (#X - 1) (sqrt(n - 1)/c_{new} + total\_utility\_delta(X)_{new})
              = P'(X)_{old} 
                      + #X inv\_c\_delta 
                      -  sqrt(n - 1)/c_{new} 
                      + #X U(X-Y) 
                      - total\_utility\_delta(X)_{new} 
      
    • Effect on Y's probability

      Y's probability is affected just by:

      • Delete the transfer between the removed X and all the Y's
      • Create a transfer between the new Y and all the X's
      • Create a transfer between the new Y and all the Z's
      • There is one more Y to be chosen.

      The new transfers just make the new Y clone homogeneous with the other Y clones. So we need only to account for the first one:

              total\_utility\_delta(Y)_{new} 
                      = total\_utility\_delta(Y)_{old} + U(X-Y)
      

      Y's new probability is:

              P'(Y)_{new}
              = (#Y + 1) (sqrt(n - 1)/c_{new} + total\_utility\_delta(Y)_{new})
              = P'(Y)_{old} 
                      + #Y inv\_c\_delta 
                      + sqrt(n - 1)/c_{new} 
                      + #Y U(X-Y)
                      + total\_utility\_delta(Y)_{new}
      
      ***** Effect on Z's probability
      
      Z's probability is affected just by:
      
       * Delete the transfer between the removed X and all the Z's 
       * Create a transfer between the new Y and all the Z's 
      
      So
      #+BEGIN_EXAMPLE
              total\_utility\_delta(Z)_{new} 
                      = total\_utility\_delta(Z)_{old} + U(X-Z) + U(Z-Y)
                      = total\_utility\_delta(Z)_{old} + U(X-Y)
      

      Z's new probability is:

              P'(Z)_{new}
              = #Z (sqrt(n - 1)/c_{new} + total\_utility\_delta(Z)_{new})
              = P'(Z)_{old} + #Z inv\_c\_delta + #Z U(X-Y)
      
    • Effect on voter's total utility

      The total utility deltas of X and Y differ exactly where i or j is A or Y. When i and j are the same, the delta is zero, so we need not handle that case specially. So:

              total\_utility\_delta(X) =
                      total\_utility\_delta(Y) +
                      sum(U(X) - U(i)) -
                      sum(U(Y) - U(i))
      
              =       total\_utility\_delta(Y) +
                      sum(U(X-Y) for all i)
      
              =       total\_utility\_delta(Y) + N U(X-Y)
      
    • Does strategic voting help you?

      We saw earlier that strategic voting can help Z, but what about your other preferences? On the whole, does strategic voting help you?

      To figure out how much the substitution helps you, we can't simply use your utility function U, because this stage may not be the last stage, it feeds into another stage, and the probabilities there are not linear because there are multiple iterations. So we define U'(X), your preference for candidate X getting slots on the next slate, that varies monotically with U(X). U'(X-Y) is also defined similarly between pairs of candidates.

      So your total utility delta if you succeed in strategically replacing X by Y, conveniently scaled by n sqrt(n - 1) / c, is:

              sum((P(W)_{new}-P(W)_{old})U'(W)_{you}, member(W,{X,Y,Z}))
      
              = (P'(X)_{new} - P'(X)_{old}) U'(X)_{you} 
                      + (P'(Z)_{new} - P'(Z)_{old}) U'(Z)_{you} 
                      + (P'(Y)_{new} - P'(Y)_{old}) U'(Y)_{you} 
      
              = (#X inv\_c\_delta -  sqrt(n - 1)/c_{new} + #X U(X-Y)_{crowd} 
                              - total\_utility\_delta(X)_{new})
                              U'(X)_{you} 
                      + (#Z inv\_c\_delta + #Z U(X-Y)_{crowd}) 
                              U'(Z)_{you}
                      + (#Y inv\_c\_delta +  sqrt(n - 1)/c_{new} + #Y U(X-Y)_{crowd}
                              + total\_utility\_delta(Y)_{new})
                              U'(Y)_{you} 
      
              = (inv\_c\_delta + U(X-Y)_{crowd})
                      (#X U'(X)_{you} + #Z U'(Z)_{you} + #Y U'(Y)_{you})
                      -  sqrt(n - 1)/c_{new}U'(X-Y)_{you}
                      - total\_utility\_delta(X)_{new}_{crowd} U'(X)_{you}
                      + total\_utility\_delta(Y)_{new}_{crowd} U'(Y)_{you}
      
              = (inv\_c\_delta + U(X-Y)_{crowd})
                              (#X U'(X)_{you} + #Z U'(Z)_{you} + #Y U'(Y)_{you})
                      - ((sqrt(n - 1)/c_{new}) + total\_utility\_delta(Y)_{new}_{crowd})
                              U'(X-Y)_{you}
                      - N U(X-Y)_{crowd} U'(X)_{you}
      

      The first term means that the substitution helps you:

      • to the degree that the sincere slate would align with your preferences and
      • roughly in proportion to the crowd's positive preference for X over Y.
      • inv_c_delta is almost certainly negligible.

      Basically, by replacing a clone of X with less-favored Y, you have lifted all the other candidates a little, even the other X's.

      The second term says that the substitution hurts you:

      • to a degree that is more than the degree that the crowd likes the new candidate X more than average, and
      • in proportion to your own positive preference for X over Y.

      Both factors are likely to be negative. Given the strategic-voting assumption that the crowd's preference X > Z > Y, it is likely that total_utility_delta(Y)crowd is negative. This is offset positively by (sqrt(n - 1)/c).

      And recall that your preferences U'(X) are for slots in a slate, not for outcomes. Given that strategic voting has some effect, your preference U'(X) will be relatively less than your preference U(X), and U'(Y) relatively more than U(Y), and your U'(X-Y) is probably negative, particularly as you take later stages into account.

      The third term means that the substitution helps you in proportion to:

      • the number of slots
      • the crowd's preference X > Y, and
      • your negative utility for X to be in the next slate.

      So on the whole, it's iffy to move voting mass to a candidate you like less, but reasonable to move extra voting mass to your favorite candidate.

      Your own possible participation in the next round does not materially change this situation, because of our assumption that the crowd outnumbers you. You run a small extra risk of accidentally electing your least-favorite candidate.

1.2.3 Ballot construction

A few words on an unrelated Hay voting issue:

The original writeup of Hay voting by Peter de Blanc seemed to suggest that voters would do their own calculations, allocating points from a total 1.00.

ISTM ballot construction can be much simpler: Let the user simply assign numerical utiles to the candidates and let all the calculations be done off-ballot by the voting software.

1.3 Revision history

  • Sometime in june 2008: Started as a text file
  • 3 june 2009: Change to org file. Added illustrations with dia.
  • 4 june 2009: Added redesign of the flattening rule as a new section

Author: Tom Breton (Tehom) <tehom@localhost.localdomain>

Date: 2009-06-04 23:19:59 EDT

HTML generated by org-mode 6.25d in emacs 21