Self-stabilizing Balls & Bins in Batches The Power of Leaky Bins

.


Introduction
One of the fundamental problems in distributed computing is the distribution of requests, tasks, or data items to a set of uniform servers.In order to simplify this process and to avoid a single point of failure, it is often advisable to use a simple, randomized strategy instead of a complex, centralized controller to allocate the requests to the servers.In the most naïve strategy (1-choice), each client chooses the server where to send its request uniformly at random.A more elaborate scheme (2-choice) chooses two (or more) servers, queries their current loads, and sends the request to a least loaded of these.Both approaches are typically modelled as balls-into-bins processes [2,4,5,7,13,20,22], where requests are represented as balls and servers as bins.While the latter approach leads to considerably better load distributions [4,7], it loses some of its power in parallel settings, where requests arrive in parallel and cannot take each other into account [2,22].
We propose and study a new infinite and batchwise balls-into-bins process to model the client-server scenario.In a round, each server (bin) consumes one of its current tasks (balls).Afterward, (expectedly) λn tasks arrive and are allocated using a given distribution scheme.The arrival rate λ is allowed to be a function of n (e.g., λ = 1 − 1/poly(n)).Standard balls-into-bins results imply that, for high arrival rates, with high probability1 (w.h.p.) each round there is a bin that receives Θ(log n) balls.
Most other infinite processes limit the total number of concurrent balls in the system by n [4,5] and show a fast recovery.Since we do not limit the number of balls, our process can, in principle, result in an arbitrarily high system load.In particular, if starting in a high-load situation (e.g., exponentially many balls), we cannot recover in a polynomial number of steps.Instead, we adapt the following notion of self-stabilization: The system is positive recurrent (expected return time to a low-load situation is finite) and taking a snapshot of the load situation at an arbitrary (even super-exponential large) time yields (w.h.p.) a low maximum load.Positive recurrence is a standard notion for stability and basically states that the system load is time-invariant.For irreducible, aperiodic Markov chains it implies the existence of a unique stationary distribution (cf.Section 1.2).While this alone does not guarantee a good load in the stationary distribution, together with the snapshot property we can look at an arbitrarily time window of polynomial size (even if it is exponentially far away from the start situation) and give strong load guarantees.In particular, we give the following bounds on the load in addition to showing positive recurrence: 1-choice Process: The maximum load at an arbitrary time is (w.h.p.) bounded by O 1  1−λ • log n 1−λ .We also provide a lower bound which is asymptotically tight for λ ≤ 1 − 1/poly(n).While this implies that already the simple 1-choice process is self-stabilizing, the load properties in a "typical" state are poor: even an arrival rate of only λ = 1 − 1/n yields a superlinear maximum load.

2-choice Process:
The maximum load at an arbitrary time is (w.h.p.) bounded by O log n 1−λ .This allows to maintain an exponentially better system load compared to the 1-choice process; for any λ = 1 − 1/poly(n) the maximum load remains logarithmic.

Related Work
Let us continue with an overview of related work.We start with classical results for sequential and finite balls-into-bins processes, go over to parallel settings, and give an overview over infinite and batch-based processes similar to ours.We also briefly mention some results from queuing theory (which is related but studies slightly different quality of service measures and system models).
Sequential Setting.There are many strong, well-known results for the classical, sequential balls-intobins process.In the sequential setting, m balls are thrown one after another and allocated to n bins.For m = n, the maximum load of any bin is known to be (w.h.p.) (1 + o(1)) • ln(n)/ ln ln n for the 1-choice process [13,20] and ln ln(n)/ ln d + Θ(1) for the d-choice process with d ≥ 2 [4].If m ≥ n • ln n, the maximum load increases to m/n + Θ m • ln(n)/n [20] and m/n + ln ln(n)/ ln d + Θ(1) [7], respectively.In particular, note that the number of balls above the average grows with m for d = 1 but is independent of m for d ≥ 2. This fundamental difference is known as the power of two choices.A similar (if slightly weaker) result was shown by Talwar and Wieder [24] using a quite elegant proof technique (which we also employ and generalize for our analysis in Section 3).Czumaj and Stemann [11] study adaptive allocation processes where the number of a ball's choices depends on the load of queried bins.The authors subsequently analyze a scenario that allows reallocations.Berenbrink et al. [9] adapt the threshold protocol from [2] (see below) to a sequential setting and m ≥ n bins.Here, ball i randomly choose a bin until it sees a load smaller than 1 + i/n.While this is a relatively strong assumption on the balls, this protocol needs only O(m) choices in total (allocation time) and achieves an almost optimal maximum load of m/n + 1.
Parallel Setting.Several papers (e.g.[2,22]) investigated parallel settings of multiple-choice games for the case m = n.Here, all m balls have to be allocated in parallel, but balls and bins might employ some (limited) communication.Adler et al. [2] consider a trade-off between the maximum load and the number of communication rounds r the balls need to decide for a target bin.Basically, bounds that are close to the classical (sequential) processes can only be achieved if r is close to the maximum load [2].The authors also give a lower bound on the maximum load if r communication rounds are allowed, and Stemann [22] provides a matching upper bound via a collision-based protocol.
Infinite Processes.In infinite processes, the number of balls to be thrown is not fixed.Instead, in each of infinitely many rounds, balls are thrown or reallocated while and bins possibly delete old balls.Azar et al. [4] consider an infinite, sequential process starting with n balls arbitrarily assigned to n bins.In each round one random ball is reallocated using the d-choice process.For any t > cn 2 log log n, the maximum load at time t is (w.h.p.) ln ln(n)/ ln d + O(1).
Adler et al. [1] consider a system where in each round m ≤ n/9 balls are allocated.Bins have a FIFO-queue, and each arriving ball is stored in the queue of two random bins.After each round, every non-empty bin deletes its frontmost ball (which automatically removes its copy from the second random bin).It is shown that the expected waiting time is constant and the maximum waiting time is (w.h.p.) ln ln(n)/ ln d + O (1).The restriction m ≤ n/9 is the major drawback of this process.A differential and experimental study of this process was conducted in [6].The balls' arrival times are binomially distributed with parameters n and λ = m/n.Their results indicate a stable behaviour for λ ≤ 0.86.A similar model was considered by Mitzenmacher [18], who considers ball arrivals as a Poisson stream of rate λn for λ < 1.It is shown that the 2-choice process reduces the waiting time exponentially compared to the 1-choice process.
Czumaj [10] presents a framework to study the recovery time of discrete-time dynamic allocation processes.In each round one of n balls is reallocated using the d-choice process.The ball is chosen either by selecting a random bin or by selecting a random ball.From an arbitrary initial assignment, the system is shown to recover to the maximum load from [4] within O n 2 ln n rounds in the former and O(n ln n) rounds in the latter case.Becchetti et al. [5] consider a similar process with only one random choice per ball, also starting from an arbitrary initial assignment of n balls.In each round, one ball is chosen from every non-empty bin and reallocated randomly.The authors define a configuration to be legitimate if the maximum load is O(log n).They show that (w.h.p.) any state recovers in linear time to a legitimate state and maintain such a state for poly(n) rounds.
Batch-Processes.Batch-based processes allocate m balls to n bins in batches of (usually) n balls each, where each batch is allocated in parallel.They lie between (pure) parallel and sequential processes.For m = τ • n, Stemann [22] investigates a scenario with n players each having m/n balls.To allocate a ball, every player independently chooses two bins and allocates copies of the ball to both of them.Every bin has two queues (one for first copies, one for second copies) and processes one ball from each queue per round.When a ball is processed, its copy is removed from the system and the player is allowed to initiate the allocation of the next ball.If τ = ln n, all balls are processed in O(ln n) rounds and the waiting time is (w.h.p.) O(ln ln n).Berenbrink et al. [8] study the d-choice process in a scenario where m balls are allocated to n bins in batches of size n each.The authors show that the load of every bin is (w.h.p.) m/n ± O(log n).As noted in Lemma 3.5, our analysis can be used to derive the same result by easier means.
Queuing Processes.Batch arrival processes have also been considered in the context of queuing systems.A key motivation for such models stems from the asynchronous transfer mode (ATM) in telecommunication systems.Tasks arrive in batches and are stored in a FIFO queue.Several papers [3,15,16,21] consider scenarios where the number of arriving tasks is determined by a finite state Markov chain.Results study steady state properties of the system to determine properties of interest (e.g., waiting times or queue lengths).Sohraby and Zhang [21] use spectral techniques to study a multi-server scenario with an infinite queue.Alfa [3] considers a discrete-time process for n identical servers and tasks with constant service time s ≥ 1.To ensure a stable system, the arrival rate λ is assumed to be ≤ n/s and tasks are assigned cyclical, allowing to study an arbitrary server (instead of the complete system).Kamal [15] and Kim et al. [16] study a system with a finite capacity.Tasks arriving when the buffer is full are lost.The authors study the steady state probability and give empirical results to show the decay of waiting times as n increases.

Model & Preliminaries
We model our load balancing problem as an infinite, parallel balls-into-bins processes.Time is divided into discrete, synchronous rounds.There are n bins and n generators, and the initial system is assumed to be empty.At the start of each round, every non-empty bins deletes one ball.Afterward, every generator generates a ball with a probability of λ = λ(n) ∈ [0, 1] (the arrival rate).This generation scheme allows us to consider arrival rates that are arbitrarily close to one (like 1 − 1/poly(n)).Generated balls are distributed in the system using a distribution process.In this paper we analyze two specific distribution processes: (a) The 1-choice process Greedy [1] assigns every ball to a randomly chosen bin.(b) The 2-choice process Greedy [2] assigns every ball to a least loaded among two randomly chosen bins.
Notation.The random variable X i (t) denotes the load (number of balls) of the i-th fullest bin at the end of round t.Thus, the load situation (configuration) after round t can be described by the load vector as the average load at the end of round t.The value ν(t) denotes the fraction of non-empty bins after round t and η(t) := 1 − ν(t) the fraction of empty bins after round t.It will be useful to define 1 i (t) := min 1, X i (t) and η i (t) := 1 i (t) − ν(t) (which equals η(t) if i is a non-empty bin and −ν(t) otherwise).
Markov Chain Preliminaries.The evolution of the load vector over time can be interpreted as a Markov chain, since X(t) depends only on X(t − 1) and the random choices during round t.We refer to this Markov chain as X.Note that X is time-homogeneous (transition probabilities are timeindependent), irreducible (every state is reachable from every other state), and aperiodic (path lengths have no period; in fact, our chain is lazy).Recall that such a Markov chain is positive recurrent (or ergodic) if the probability to return to the start state is 1 and the expected return time is finite.In particular, this implies the existence of a unique stationary distribution.Positive recurrence is a standard formalization of the intuitive concept of stability.See [17] for an excellent introduction into Markov chains and the involved terminology.

The 1-Choice Process
We present two main results for the 1-choice process: Theorem 2.1 states the stability of the system under the 1-choice process for an arbitrary λ, using the standard notion of positive recurrence (cf.Section 1).In particular, this implies the existence of a stationary distribution for the 1-choice process.Theorem 2.2 strengthens this by giving a high probability bound on the maximum load for an arbitrary round t ∈ N. Together, both results imply that the 1-choice process is self-stabilizing.
Fix an arbitrary round t of the 1-choice process.The maximum load of all bins is (w.h.p.) bounded by O 1  1−λ • log n 1−λ .Note that for high arrival rates of the form λ(n) = 1 − ε(n), the bound given in Theorem 2.2 is inversely proportional to ε(n).For example, for ε(n) = 1/n the maximal load is O(n log n).Theorem 2.3 shows that this dependence is unavoidable: the bound given in Theorem 2.2 is tight for large values of λ.In Section 3, we will see that the 2-choice process features an exponentially better behaviour for large λ.
there is a bin i in step t with load Ω 1  1−λ • log n .The proofs of these results can be found in the following subsections.We first prove a bound on the maximum load (Theorem 2.2), afterwards we prove stability of the system (Theorem 2.1), and finally we prove the lower bound (Theorem 2.3).

Maximum Load -Proof of Theorem 2.2
Proof of Theorem 2.2 (Maximum Load).We prove Theorem 2.2 using a (slightly simplified) drift theorem from Hajek [14] (cf.Theorem A.2 in Appendix A).Remember that, as mentioned in Section 1.2, our process is a Markov chain, such that we need to condition only on the previous state (instead of the full filtration from Theorem (A.2)).Our goal is to bound the load of a fixed bin i at time t using Theorem A.2 and, subsequently, to use this with a union bound to bound the maximum load over all bins.To apply Theorem A.2, we have to prove that the maximum load difference of bin i between two rounds is is exponentially bounded (Majorization) and that, given a high enough load, the system tends to loose load (Negative Bias).We start with the majorization.The load difference , where B i (t) is the number of tokens resource i receives during round t + 1.In particular, we have is binomially distributed with parameters n and λ/n (each of the n balls has probability of λ • 1/n to end up in i).
Using standard inequalities we bound and calculate This shows that the Majorization condition from Theorem A.2 holds (with λ = 1 and D = Θ( 1)).To see that the Negative Bias condition is also given, note that if bin i has non-zero load, it is guaranteed to delete one ball and receives in expectation We finally can apply Theorem A.2 with where c denotes a suitable constant.Applying a union bound to all n bins and choosing b :

Stability -Proof of Theorem 2.1
In the following, we provide an auxiliary lemma that will prove useful to derive the stability of the 1-choice process.
Fix an arbitrary round t of the 1-choice process and a bin i.There is a constant c > 0 such that the expected load of bin i is bounded by 6c To get a bound on the expected load of bin i, note that the probability in Equation ( 3 This finishes the proof. Proof of Theorem 2.1 (Stability).We prove Theorem 2.1 using a result from Fayolle et al. [12] (cf.Theorem A.1 in Appendix A).Note that X is a time-homogenous irreducible Markov chain with a countable state space.For a configuration x we define the auxiliary potential Ψ(x) := n i=1 x i as the total system load of configuration x.Consider the (finite) set of all configurations with not too much load.To prove positive recurrence, it remains to show that Condition (a) (expected potential drop if not in a high-load configuration) and Condition (b) (finite potential) of Theorem A.1 hold.In the following, let ∆ := n 3 (1−λ) 2 .Let us start with Condition (a).So fix a round t and let x = X(t) ∈ C. By definition of C, we have Ψ(x) > n 4 /(1 − λ) 3 , such that there is at least one bin i with load x i ≥ Ψ(x)/n > n 3 /(1 − λ) 2 .In particular, note that x i ≥ ∆, such that during each of the next ∆ rounds exactly one ball is deleted.On the other hand, bin i receives in expectation ∆ • λn • 1 n = λ∆ balls during the next ∆ rounds.We get For any bin j = i, we assume pessimistically that no ball is deleted.Note that the expected load increase of each of these bins can be majorized by the load increase in an empty system running for ∆ rounds.Thus, we can use Lemma 2.4 to bound the expected load increase in each of these bins by 6c  1 We get This proves Condition (a) of Theorem A.1.For Condition (b), assume x = X(t) ∈ C. We bounds the system load after ∆ rounds trivially by (note that the finiteness in Theorem A.1 is with respect to time, not n).This finishes the proof.

Lower Bound on Maximum Load -Proof of Theorem 2.3
Proof of Theorem 2.3 (Lower Bound).To show this result we will use the bound of Theorem A.3 which lower bounds the the maximum number of balls a bin receives when m balls are allocated into m bins.The idea of the proof is as follows.We assume that we start at an empty system and apply Theorem A.3 on m = λtn many balls.The theorem says that one of the bins is likely to get much more than λt many balls, which allows us to show that the load of this bin is large, even if the bin was able to delete a ball during each of the t observed time steps.Let m(t ) the the number of balls allocated during the first t steps and let b u (t ) the number of these balls that are allocated to bin u.Set t = 9λ log(n)/(64(1 − λ) 2 ), assume λ > 0.5 and assume λnt ≤ n • (log n) c for a constant c.Since the expected number of balls is λnt ≥ n log n we can use Chernoff bounds to show that w.h.p. at least (1 − ) • m(t) balls are generated for very small . Then Using Chernoff's inequality we can show that w.h.p.
• λn for an arbitrary small constant .By Theorem A.3 (Case 3) with α = 8/9 we get (w.h.p.) We derive 3 The 2-Choice Process We continue with the study of the 2-choice process.Here, new balls are distributed according to Greedy [2] (cf.description in Section 1.2).Our main results are the following theorems, which are equivalents to the corresponding theorems for the 1-choice process.
. Fix an arbitrary round t of the 2-choice process.The maximum load of all bins is (w.h.p.) bounded by O log n 1−λ .Note that Theorem 3.2 implies a much better behaved system than we saw in Theorem 2.2 for the 1-choice process.In particular, it allows for an exponentially higher arrival rate: for λ(n) = 1 − 1/poly(n) the 2-choice process maintains a maximal load of O(log n).In contrast, for the same arrival rate the 1-choice process results in a system with maximal load Ω(poly(n)).
Our analysis of the 2-choice process relies to a large part on a good bound on the smoothness (the maximum load difference between any two bins).This is stated in the following lemma.This result is of independent interest, showing that even if the arrival rate is 1 − e −n , where we get a polynomial system load, the maximum load difference is still logarithmic.
. Fix an arbitrary round t of the 2-choice process.The load difference of all bins is (w.h.p.) bounded by O(ln n).
Analysis Overview.To prove these results, we combine three different potential functions: For a configuration x with average load ∅ and for a suitable constant α (to be fixed later), we define x i , and The potential Φ measures the smoothness (basically the maximum load difference to the average) of a configuration and is used to prove Lemma 3.3 (Section 3.1).The proof is based on the observation that whenever the load of a bin is far from the average load, it decreases in expectation.The potential Ψ measures the total load of a configuration and is used, in combination with our results on the smoothness, to prove Theorem 3.2 (Section 3.2).The potential Γ entangles the smoothness and total load, allowing us to prove Theorem 3.1 (Section 3.3).The proof is based on the fact that whenever Γ is large (i.e., the configuration is not smooth or it has a huge total load) it decreases in expectation.Before we continue with our analysis, let us make a simple but useful observation concerning the smoothness: For any configuration x and value b ≥ 0, the inequality Φ(x) ≤ e α•b implies (by definition of Φ) max i |x i − ∅| ≤ b.That is, the load difference of any bin to the average is at most b and, thus, the load difference between any two bins is at most 2b.We capture this in the following observation.

Bounding the Smoothness
The goal of this section is to prove Lemma 3.3.To do so, we show the following bound on the expected smoothness (potential Φ) at an arbitrary time t: . Fix an arbitrary round t of the 2-choice process.There is a constant Note that Lemma 3.5 together with Observation 3.4 immediately implies Lemma 3.3 by a simple application of Markov's inequality to bound the probability that Φ(X(t)) ≥ n 2 /ε.
Our proof of Lemma 3.5 follows the lines of [19,24], who used the same potential function to analyze variants of the sequential d-choice process without deletions.While the basic idea of showing a relative drop when the potential is high combined with a bounded absolute increase in the general case is the same, our analysis turns out much more involved.In particular, not only do we have to deal with deletions and throwing balls in batches but the size of each batch is also a random variable.Once Lemma 3.5 is proven, Lemma 3.3 emerges by combining Observation 3.4, Lemma 3.5, and Markov's inequality as follows: It remains to prove Lemma 3.5.Remember the definition of Φ(x) from Equation ( 8).We split the potential in two parts Φ(x) := Φ + (x) + Φ − (x).Here, Φ + (x) := i e α•(xi−∅)) denotes the upper potential of x and Φ − (x) := i e α•(∅−xi)) denotes the lower potential of x.For a fixed bin i, we use to denote i's contribution to the upper and lower potential, respectively.When we consider the effect of a fixed round t + 1, we will sometimes omit the time parameter and use prime notation to denote the value of a parameter at the end of round t + 1.
For example, we write X i and X i for the load of bin i at the beginning and at the end of round t + 1, respectively.We start with two simple but useful identities regarding the potential drop ∆ i,+ (t+1) (and ∆ i,− (t+1)) due to a fixed bin i during round t + 1.
Observation 3.6.Fix a bin i, let K denote the number of balls that are placed during round t + 1 and let k ≤ K be the number of these balls that fall into bin i.
We now derive the main technical lemma that states general bounds on the expected upper and lower potential change during a single round.This will be used to derive bounds on the potential change in different situations.For this, let the probability that a ball thrown with Greedy [2] falls into the i-th fullest bin).We also define α := e α − 1 and α := 1 − e −α .Note that α ∈ (α, α + α 2 ) and α ∈ (α − α 2 , α) for α ∈ (0, 1.7).This follows easily from the Taylor approximation e x ≤ 1 + x + x 2 , which holds for any x ∈ (−∞, 1.7] (we will use this approximation several times in the analysis).Finally, let δi : These δi and δi values can be thought of as upper/lower bounds on the expected difference in the number of balls that fall into bin i under the 1-choice and 2-choice process, respectively (note that 1, 1, α/α, and α/α are all constants close to 1).Lemma 3.7.Consider a bin i after round t and a constant α ≤ 1.
(a) For the expected change of i's upper potential during round t + 1 we have 2 For Φ, the condition λ ≥ 1/4 can be substituted with λ = Ω(1) and only minor changes in the analysis.Moreover, the analysis can be easily adapted for a process that (deterministically) throws λ • n balls in each round, even for λ > 1 as long as it is a constant.Finally, one can easily adapt the analysis to cover the process without deletions by setting η i (t) = 0 (see Observation 3.6).Using Markov's inequality, this yields the same result as [8] using a simpler analysis.
(b) For the expected change of i's lower potential during round t + 1 we have Proof.For the first statement, we use Observation 3.6 to calculate where we first apply the law of total expectation together with Observation 3.6 and, afterward, twice the binomial theorem.Continuing the calculation using the aforementioned Taylor approximation e x ≤ 1 + x + x 2 (which holds for any x ∈ (−∞, 1.7]), and the definition of δi yields Now, the claim follows by another application of the Taylor approximation.The second statement follows similarly.
Using Lemma 3.7, we derive different bounds on the potential drop that will be used in the various situations.The proofs for the following statements can all be found in Appendix C.
We start with a result that will be used when the potential is relatively high.For the expected upper and lower potential drop during round t + 1 we have The next lemma derives a bound that is used to bound the upper potential change in reasonably balanced configurations.Lemma 3.9.Consider a round t and the constants ε (from Claim B.2) and α ≤ min(ln(10/9), ε/4).Let λ ∈ [1/4, 1] and assume X 3  4 n (t) ≤ ∅(t).For the expected upper potential drop during round t + 1 we have The next lemma derives a bound that is used to bound the lower potential drop in reasonably balanced configurations.
Lemma 3.10.Consider a round t and the constants ε (from Claim B.2) and α ≤ min(ln( 10 /9), ε/8).Let λ ∈ [1/4, 1] and assume X n 4 (t) ≥ ∅(t).For the expected lower potential drop during round t we have The next lemma derives a bound that will be used to bound the potential drop in configurations with many balls far below the average to the right.Lemma 3.11.Consider a round t and constants α ≤ 1/46 (< ln(10/9)) and ε ≤ 1/3.Let λ ∈ [1/4, 1] and assume X 3  4 n (t) ≥ ∅(t) and E[∆ The next lemma derives a bound that will be used to bound the potential drop in configurations with many balls far above the average to the left.Lemma 3.12.Consider a round t and constants α ≤ 1/32 (< ln(10/9)) and ε Putting all these lemmas together, we can derive the following bound on the potential change during a single round.Lemma 3.13.Consider an arbitrary round t + 1 of the 2-choice process and the constants ε (from Claim B.2) and α ≤ min(ln(10/9), ε/8).For λ ∈ [1/4, 1] we have We can use this result in a simple induction to prove Lemma 3.5.
Proof of Lemma 3.5.Lemma 3.13 gives us a γ < 1 and c > 0 such that E[Φ(X(t + 1)) | X(t)] ≤ γ • Φ(X(t)) + c holds for all rounds t ≥ 0. Taking the expected value on both sides yields E[Φ(X(t Using induction and the linearity of the expected value, it is easy to check that E[Φ(X(t))] ≤ c 1−γ solves this recursion.Using the values from Lemma 3.13 for γ and c (substituting . The lemma's statement follows for the constant ε = O ε −9 /(αλ) .

Bounding the Maximum Load
The goal of this section is to prove Theorem 3.2.Remember the definitions of Φ(x) and Ψ(x) from Equation ( 8).For any fixed round t, we will prove that (w.h.p.) Ψ(X(t)) = O(n • ln n), so that the average load is ∅ = O(ln n).Using a union bound and Lemma 3.3, we see that (w.h.p.) the the maximum load at the end of round t is bounded by It remains to prove a high probability bound on Ψ(X(t)) for arbitrary t.To get an intuition for our analysis, consider the toy case t = poly(n) and assume that exactly λ • n ≤ n balls are thrown each round.Here, we can combine Observation 3.4 and Lemma 3.5 to bound (w.h.p.) the load difference between any pair of bins and for all t < t by O(ln n) (via a union bound over poly(n) rounds).Using the combinatorial observation that, while the load distance to the average is bounded by some b ≥ 0, the bound Ψ ≤ 2b • n is invariant under the 2-choice process (Lemma 3.14), we get for b = O(ln n) that Ψ(X(t)) ≤ 2b • n = O(n • ln n), as required.The case for t = ω(poly(n)) is considerably more involved.In particular, the fact that the number of balls in the system is only guaranteed to decrease when the total load is high and the load distance to the average is low makes it challenging to design a suitable potential function that drops fast enough when it is high.Thus, we deviate from this standard technique and elaborate on the idea of the toy case: Instead of bounding (w.h.p.) the load difference between any pair of bins by O(ln n) for all t < t (which is not possible for t poly(n)), we prove (w.h.p.) an adaptive bound of O(ln(t − t ) • f (λ)) for all t < t, where f is a suitable function (Lemma 3.15).Then we consider the last round t < t with an empty bin.Observation 3.4 yields a bound of • n on the total load at time t .Using the same combinatorial observation as in the toy case, we get that (w.h.p.) The final step is to show that the load at time t (which is logarithmic in t − t ) decreases linearly in t − t , showing that the time interval t − t cannot be too large (or we would get a negative load at time t).See Figure 1for an illustration.Proof.We distinguish two cases: If there is no empty bin, then all n bins delete one ball.Since the maximum number of new balls is n, the number of balls cannot increase.That is, we have Ψ(x ) ≤ Ψ(x) ≤ 2b • n.Now consider the case that there is at least one empty bin.Let η ∈ (0, 1] denote the fraction of empty bins (i.e., there are exactly η • n > 0 empty bins).Since the minimal load is zero, Observation 3.4 implies max i x i ≤ 2b.Thus, the total number of balls in configuration x is at most (1 − η)n • 2b.Exactly (1 − η)n balls are deleted (one from each non-empty bin) and at most n new balls enter the system.We get Figure 1: To bound the system load at time t, consider the minimum load and our bound on the load difference over time.There was a last time t when there was an empty bin.The system load can only increase if there is an empty bin, and this increase is bounded by our bound on the load difference.Exploiting that the system load decreases linearly in time while every increase is bounded by our logarithmic bound on the load difference, we find a small interval [t , t] containing t .
. Let Y i be the number of balls which spawn in I i .
(a) Define the (good) smooth event (b) Define the (good) bounded balls event Using the union bound over all t < t we calculate where the last inequality applies the solution to the Basel problem.This proves the first statement.
For the second statement, let The desired statement follows by applying the identity Z i = |I i | • n − Y i and taking the union bound.Lemma 3.16.Fix a round t and assume that both S t and B t hold.Then Ψ(X(t)) ≤ 9n α • ln n 1−λ .Proof.Let t < t be the last time when there was an empty bin and set ∆ := t − t .Note that t is well defined, as we have . By choice of t we have min i X i (t ) = 0. Together with Observation 3.4 we get max i X i (t )) ≤ 2 ln ∆ 2 • n 2 /α.Summing up over all bins (and pulling out the square), this implies Ψ(X(t )) ≤ 4n • ln ∆ • n /α.Applying Lemma 3.14 yields Ψ(X(t + 1)) ≤ 4n • ln ∆ • n /α.By choice of t , there is no empty bin in X(t ) for all t ∈ { t + 1, t + 2, . . ., t − 1 }.Thus, during each of these rounds exactly n balls are deleted.To bound the number of deleted balls, let i be maximal with I i ⊆ [t , t] (as defined in Lemma 3.15).Since B t holds and using the maximality of i, the number of balls Y that spawn during [t , t] is at most With where W −1 denotes the lower branch of the Lambert W function 3 .This implies that ∆ ≤ −f • W −1 (− 1 /fn), since otherwise we would have Ψ(X(t)) < 0, which is clearly a contradiction.Using the Taylor approximation W −1 (x) = ln(−x) − ln ln(−1/x) − o(1) as x → −0, we get Finally, we use this bound on ∆ to get Now, by combining Lemma 3.16 with the fact that the events S t and B t hold with high probability (Lemma 3.15), we immediately get that (w.h.p.) Ψ(X(t)) = O(n • ln n).As described at the beginning of this section, combining this with Lemma 3.3 proves Theorem 3.2.
We are ready to prove Theorem 3.1.
Proof of Theorem 3.1.The proof proceeds by applying Theorem A.1.We now define the parameters of Theorem A.1.Let ζ(t) = X(t) and hence Ω is the state space of X.First we observe that Ω is countable since there are a constant number of bins (n is consider a constant in this matter) each having a load which is a natural number.We define φ(X(t)) to be Γ(X(t)).We define C = {x : Γ(x) ≤ 2 n 4 (1−λ) 2 λ }.Define β(x) = 1 and η = 1.We now show that the preconditions (a) and (b) of Theorem A.1 are fulfilled.
• Let x ∈ C. By definition of C and φ(X(t)), and from Lemma 3.17 we have • Let x ∈ C. Recall that Γ(X(t)) = Φ(X(t)) + Ψ(X(t)).By Lemma 3.12 and the fact the the number of balls arriving in one round is bounded by n, we derive, The claim follows by applying Theorem A.1 with Equations ( 21) and (22).

A Auxiliary Results
Theorem A.
Theorem A.2 (Simplified version of Hajek [14,Theorem 2.3]).Let (Y (t)) t≥0 be a sequence of random variables on a probability space (Ω, F, P ) with respect to the filtration (F(t)) t≥0 .Assume the following two conditions hold: (i) (Majorization) There exists a random variable Z and a constant λ > 0, such that E[e λ Z ] ≤ D for some finite D, and (|Y (t + 1) − Y (t)| F(t)) ≺ Z for all t ≥ 0; and (ii) (Negative Bias) There exist a, ε 0 > 0, such for all t we have Then, for all b and t we have Proof.The statement of the theorem provided in [14] requires besides (i) and (ii) to choose constants η, and ρ such that 0 < ρ ≤ λ , η < ε 0 /c and ρ = 1 With these requirements it then holds that for all b and t In the following we bound (23) by setting η = min{λ , ε 0 • λ 2 /(2D), 1/(2ε 0 )}.The following upper and lower bound on ρ follow.
Theorem A.3 (Raab and Steger [20,Theorem 1]).Let M be the random variable that counts the maximum number of balls in any bin, if we throw m balls independently and uniformly at random into n bins.Then where Proof.Remember that δi : and Proof.The claim follows from comments in [24].For Equation 25 recall that i<3n/4 Φ i,+ ≤ Φ + (by definition).Since Φ i,+ for i = 1, . . ., n is non-increasing where i is the i-th loaded bin, the above equation is maximized where all Φ i,+ = 4Φ+ 3n .The following observation that can be found in [23] i≥3n/4 The result follows from combining these two facts.
Claim B.3.Consider a round t and a constant α ≥ 0. The following inequalities hold: Proof.For the first statement, we calculate i∈ where the first inequality uses that Φ i,+ (X(t)) is non-increasing in i and that Φ i,+ (X(t)) ≤ 1 for all i > νn.The claim's second statement follows by a similar calculation, using that Φ i,− (X(t)) is nondecreasing in i (note that we cannot apply the same trick as above to get min n, Φ − (X(t)) instead of Φ − (X(t))).

C Missing Proofs for the 2-Choice Process
Proof of Observation 3.6.Remember that 1 i is an indicator value which equals 1 if and only if the i-th bin is non-empty in configuration X. Bin i looses exactly 1 i balls and receives exactly k balls, such that Similarly, we have ∅ − ∅ = −ν + K/n for the change of the average load.With the identity η i = 1 i − ν (see Section 1.2), this yields proving the first statement.The second statement follows similarly.
Proof of Lemma 3.8.We prove the statement for R = +.The case R = − follows similarly.Using Lemma 3.7 and summing up over all i ∈ [n] we get Here, the last inequality uses λ ≤ 1 and | δi | ≤ 5 4 λ (Claim B.1).We now apply Claim B.3, νη ≤ 1/4 ≤ λ, and α < 1/8 to get Proof of Lemma 3.9.To calculate the expected upper potential change, we use Lemma 3.7 and sum up over all i ∈ [n] (using similar inequalities as in the proof of Lemma 3.8 and the definition of δi ): We now use that Φ i,+ = e α•(Xi−∅) ≤ 1 for all i > 3 4 n (by our assumption on X 3 4 n ).This yields Finally, we apply Claim B.2 and the definition of 1 and α to get Using α ≤ ε/4 yields the desired result.
Proof of Lemma 3.10.To calculate the expected lower potential change, we use Lemma 3.7 and sum up over all i ∈ [n] (as in the proof of Lemma 3.9): We now use that Φ i,− = e α•(∅−Xi) ≤ 1 for all i ≤ n 4 (by our assumption on X n 4 ) and apply Claim B.2 to get where the last inequality used the definitions of 1, α, as well as α > α − α 2 .Using α ≤ ε/8 yields the desired result.
Proof of Lemma 3.11.Let L := i∈[n] max(X i − ∅, 0) = i∈[n] max(∅ − X i , 0) be the "excess load" above and below the average.First note that the assumption X 3 4 n ≥ ∅ implies Φ − ≥ n 4 • exp( αL n/4 ) (using Jensen's inequality).On the other hand, we can use the assumption E[∆ + (t + 1) | X] ≥ − εαλ 4 • Φ + to show an upper bound on Φ + .To this end, we use Lemma 3.7 and sum up over all i ∈ [n] (as in the proof of Lemma 3.9): For i ≤ n/3 we have p i = 2i−1 n 2 ≤ 2 3n and, using the definition of 1 and α, δi = λn where the last inequality uses α ≤ 1/46 ≤ (the last inequality uses that none of the 2n/3 remaining bins can have a load higher than L/(n/3)).To finish the proof, assume Φ + > ε 4 • Φ − (otherwise the lemma holds).Combining this with the upper bound on Φ + and with the lower bound on Φ − , we get 16n 3ε e Thus, the excess load can be bounded by L < n α • ln 256 3ε 2 .Now, the lemma's statement follows from Proof of Lemma 3.12.Let L := i∈[n] max(X i − ∅, 0) = i∈[n] max(∅ − X i , 0) be the "excess load" above and below the average.First note that the assumption X n 4 ≤ ∅ implies Φ + ≥ n 4 • e αL n/4 (using Jensen's inequality).On the other hand, we can use the assumption E[∆ − (t + 1) | X] ≥ − εαλ 4 • Φ − to show an upper bound on Φ − .To this end, we use Lemma 3.7 and sum up over all i ∈ [n] (as in the proof of Lemma 3.10): For i ≥ 2n/3 we have p i = 2i−1 n 2 ≥