Multisection: Parallelized Bisection

Effectively Using Asynchronous Information

Stephen Pallone
Peter Frazier
Shane Henderson

School of Operations Research and Information Engineering,
Cornell University

Bisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

bisection3.svg

Bisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

bisection2.svg

Bisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

bisection1.svg

Multisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

multi3.svg

Multisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

multi2.svg

Multisection Algorithm

Goal: Find the zero of a deterministic, one-dimensional monotonic function.

multi1.svg

Modeling Computational States

Keeping track of unfinished jobs and assigning new queries

state3.svg

Modeling Computational States

Keeping track of unfinished jobs and assigning new queries

state2.svg

Modeling Computational States

Keeping track of unfinished jobs and assigning new queries

state1.svg

Modeling Computational States

Keeping track of unfinished jobs and assigning new queries

Modeling Transitions

How allocations transition to computational states

transition3.svg

Modeling Transitions

How allocations transition to computational states

transition2.svg

Modeling Transitions

How allocations transition to computational states

transition1.svg

Modeling Transitions

How allocations transition to computational states

Defining the Objective Function

Algorithmic Parameters

Objective: given risk preference and computational budget, minimize interval length.

\[ \sup_\pi \mathbb{E}^\pi\left[ R(S_\tau) \right] \]

Modeling Waiting Times for Queries

Deterministic and Equal Waiting Times

IID Exponential Waiting Times

Dynamic Programming

Value Function

Intuition

Shifting Computational States

normstate1.svg normstate2.svg

Scaling Computational States

normstate1.svg normstate3.svg

Structural Properties of Value Function

Theorem 1: Shifting and Scaling

Suppose that \(R(S) = \|S\|^{-r}\) for \(r\in (0,1]\). For \(c,h\in\mathbb{R}\) with \(h>0\), the value function \(V\) has the following properties:

  1. Shift Invariance: \(V(S+c) = V(S)\)
  2. Inverse Scaling : \(V(h S) = h^{-r}\, V(S)\)
We can now focus solely on states \(S\) with \(S^{(0)}=0\) and \(S^{(m+1)}=1\).

Deterministic Waiting Times

Key Observation: only one possible normalized state.

unif3.svg

Deterministic Waiting Times

Key Observation: only one possible normalized state.

unif2.svg

Deterministic Waiting Times

Key Observation: only one possible normalized state.

unif1.svg

Deterministic Waiting Times

Corollary 1: Equally Spaced Queries are Optimal

Suppose function evaluations take a deterministic and equal amount of time to compute, and let \(R(\cdot)\) be defined as before. If \(X_j = (X_j^{(0)},X_j^{(1)},\dots,X_j^{(n+1)})\) is defined as \[ X_j^{(i)} = A_j + \frac{i}{n+1} \left(B_j-A_j\right), \] then \(X_j\) is an optimal allocation for \(S_j = (A_j,B_j)\).

IID Exponential Waiting Times

To Stack or Not to Stack

Stacking Policies

Why would anyone do this?

To Stack or Not to Stack

Stacking allows the acceleration of function evaluations.

stack3.svg

To Stack or Not to Stack

Stacking allows the acceleration of function evaluations.

stack2.svg

To Stack or Not to Stack

Stacking allows the acceleration of function evaluations.

stack1.svg

Optimal Stacking Policy

Theorem 3: Midpoint Stacking Policy

When stacking is allowed, stacking all queries at the midpoint is the optimal allocation for independent exponentially distribution waiting times with uniform rates. Equivalently, for states \(S_j = (A_j,B_j)\), the optimal allocation \(X_j\) is such that \[ X_j^{(i)} = A_j + \frac{1}{2}(B_j - A_j) \quad\quad \forall i=1,\dots,n.\]

Now Assume No Stacking

gsection3.svg

Now Assume No Stacking

gsection2.svg

Now Assume No Stacking

gsection1.svg

Golden Section Policy (n=2)

We can always choose allocation \( X_{GS} \) by setting

\begin{equation} X_{GS}^{(1)} = \frac{1}{2}(3 - \sqrt{5}) \quad\quad X_{GS}^{(2)} = \frac{1}{2}(\sqrt{5}-1). \end{equation}

Is it an optimal non-stacking policy?

Larger Numbers of Cores (n>2)

Can we find policies similar to Golden Section for larger values of \(n\)?

Consider the case n=3:

\begin{equation} Y = \left(0,\frac{1}{3},\frac{1}{2},\frac{2}{3},1\right) \quad\quad Z = \left(0,\frac{1}{4},\frac{1}{2},\frac{3}{4},1 \right) \end{equation}

Finite Allocation Policies

Suppose we start with allocation \(Y\)

threeY3.svg

Finite Allocation Policies

Suppose we start with allocation \(Y\)

threeY2.svg

Finite Allocation Policies

Suppose we start with allocation \(Y\)

threeY1.svg

Finite Allocation Policies

Suppose we start with allocation \(Z\)

threeZ3.svg

Finite Allocation Policies

Suppose we start with allocation \(Z\)

threeZ2.svg

Finite Allocation Policies

Suppose we start with allocation \(Z\)

threeZ1.svg

Concluding Remarks

Complications of Asynchronous Function Evaluations

Future Research